ESTI  FILE  COPY 


Prepared  under  Electronic  Systems  Division  Contract  AF  19(628)-500  by 

Lincoln  Laboratory 

MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
Lexington,  Massachusetts 


The  work  repotted  in  this  document  was  performed  at  Lincoln 
Laboratory,  a  center  for  research  operated  by  Massachusetts 
Institute  of  Technology,  with  the  support  of  the  U.S.  Air  Force 
under  Contract  AF  19  (628>500. 


Non-Lincoln  Recipients 

PLEASE  DO  NOT  RETURN 

Permission  is  given  to  destroy  this  document 
when  it  is  no  longer  needed. 


102 


MASSACHUSETTS  INSTITUTE  OF  TECHNOLOGY 
LINCOLN  LABORATORY 


SEQUENTIAL  MEASUREMENT 
OF  MULTIDIMENSIONAL  TRANSDUCERS 


/.  R.  SKLAR 
Group  22 


LINCOLN  LABORATORY 
TECHNICAL  REPORT  360 

RESEARCH  LABORATORY  OF  ELECTRONICS 
TECHNICAL  REPORT  431 

29  OCTOBER  1964 


LEXINGTON 


MASSACHUSETTS 


SEQUENTIAL  MEASUREMENT  OF  MULTIDIMENSIONAL  TRANSDUCERS* 

ABSTRACT 

Although  the  problem  of  decoding  tree-encoded  messages  in  communications  and  that  of  measuring 
the  parameters  which  describe  a  multidimensional  transducer  appear  very  different  at  first,  striking 
similarities  arise  upon  closer  scrutiny.  These  similarities  are  most  evident  when  each  successive 
transducer  output  depends  on  an  additional  transducer  parameter.  Because  of  these  similarities  and 
because  sequential  decoding  has  been  so  successful  in  decoding  tree-encoded  messages,  a  study  of 
the  application  of  sequential  decoding  algorithms  to  measurements  was  undertaken. 

This  report  analyzes  a  sequential  algorithm  suggested  by  R.  M.  Fano,  Massachusetts  Institute  of  Tech¬ 
nology  and  describes  its  application  to  measurement  problems.  From  the  analysis,  bounds  to  the 
average  number  of  computations  needed  to  estimate  one  parameter  are  obtained.  A  bound  is  also  de¬ 
rived  for  the  probabilityof  estimating  at  least  one  parameter  of  a  set  incorrectly.  Itwill  become  ap¬ 
parent  that  when  an  attempt  is  made  to  differentiate  between  parameter  values  that  produce  too  small 
an  effect  on  the  output,  relative  to  the  noise,  the  sequential  method  will  fail.  This  difficulty  deter¬ 
mines  a  limit  to  the  precision  obtainable  with  the  sequential  method.  This  critical  level  maybe 
likened  to  the  computational  cutoff  rate  in  the  corresponding  communication  problem. 

A  series  of  simulation  experiments  was  performed  to  test  the  hypotheses  and  results  of  the  theory. 
These  experiments  consisted  of  estimating  the  characteristic  impedance  values  of  the  sections  of  a 
transmission  line  constructed  of  many  short  segments.  This  problem  displays  many  of  the  features 
characteristic  of  geophysical  layer  determination.  Although  the  theoretical  and  simulated  measure¬ 
ment  problems  were  not  identical,  the  theoretical  and  experimental  results  agree,  at  least  qualita¬ 
tively.  Thus  it  appears  that  further  research  is  warranted  on  the  application  of  sequential  decoding 
to  actual  measurement  problems. 
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SEQUENTIAL  MEASUREMENT 
OF  MULTIDIMENSIONAL  TRANSDUCERS 


I.  INTRODUCTORY  REMARKS 

A.  Introduction 

One  of  the  traditional  areas  of  interest  to  the  electrical  engineer  has  been  the  design  of 
measurement  equipment.  Historically,  he  first  concentrated  on  measuring  a  single  unknown 
parameter,  trying  to  do  so  with  a  minimum  of  interference  from  other  quantities.  Then  as  time 
went  on,  it  became  necessary  to  measure  two  unknowns  simultaneously  and  the  complexity  of 
measurement  techniques  increased.  Today,  the  number  of  unknowns  in  measurement  problems 
is  typically  even  larger.  We  are  therefore  forced  to  develop  techniques  applicable  to  the  meas¬ 
urement  of  a  large  number  of  parameters  from  data  which  depend  on  many  of  them  simultaneously. 

The  interpretation  of  the  data  from  such  measurements  is  quite  complicated.  In  particular, 
the  data  required  to  measure  one  parameter  may  depend  on  some  of  the  other  parameters  whose 
values  are  not  determined.  Ideally,  we  could  quantize  the  values  of  the  parameters  to  some  ac¬ 
ceptable  degree  of  precision,  form  all  possible  combinations  of  values  for  the  system  parameters, 
and  determine  from  the  instrument's  internal  relations  the  output  for  each  such  combination.  Then 
we  could  compare  the  actual  output  with  each  of  these  postulated  outputs,  and  choose  as  the  meas¬ 
urement  result  that  set  of  system  parameters  which  produces  the  most  favorable  comparison. 

However,  if  each  parameter  takes  on  D  values  and  there  are  N  parameters,  the  number  of  com- 
N 

binations  is  D  ,  which  is  extremely  large  even  for  relatively  small  values  of  D  and  N.  It  is 
therefore  desirable  to  develop  procedures  not  characterized  by  this  exponential  growth  in  com¬ 
putational  load. 

In  this  report,  we  consider  such  a  problem.  More  specifically,  we  define  a  class  of  multi¬ 
dimensional  measurement  problems  endowed  with  a  so-called  tree  structure,  and  consider  in 

detail  an  algorithm  designed  to  determine  the  N  unknown  parameters  by  a  number  of  computations 

\ 

that  grows  only  linearly  with  N.  The  particular  algorithm  analyzed  was  introduced  by  Fano  for 
sequentially  decoding  tree-encoded  messages  transmitted  over  communication  channels.  We 
shall  show  how  this  technique  can  also  be  applied  to  measurements. 

B.  Measurement  Problem 

In  most  measurement  problems,  an  observer  attempts  to  assign  estimated  values  to  a  set  of 
unknown  system  parameters.  We  assume  throughout  the  report  that  the  observer  knows  which 
parameters  characterize  the  system  being  measured  and  that  he  also  knows  the  range  of  these 
parameters.  With  this  information,  the  observer  will  be  able  to  construct  a  general  model  of 
the  system  being  measured  and  then,  by  estimating  the  unknown  parameters,  he  will  be  able  to 


1 


characterize  it  completely.  Perhaps  it  is  required  that  the  estimates  of  the  parameters  satisfy 
some  precision  criterion.  Generally,  there  is  noise  corrupting  the  measurement,  thereby  making 
the  job  more  difficult.  If  this  noise  is  too  severe,  it  may  be  impossible  to  estimate  the  param¬ 
eters  with  less  than  some  specific  error.  Hopefully,  analysis  of  the  particular  measurement 
problem  permits  the  observer  to  determine  in  advance  whether  a  specified  measurement  tech¬ 
nique  will  satisfy  the  precision  criterion. 

A  model  of  the  system  being  measured,  together  with  the  measuring  equipment,  can  be  con¬ 
structed  as  in  Fig.  1.  The  probe  signal,  under  the  observer’s  control,  enters  the  system  which 
is  described  by  the  unknown  parameters,  and  reacts  with  it.  The  result  of  the  reaction  is  an 
output  which  is  usually  corrupted  by  noise  before  it  becomes  available  to  the  observer.  This 
distorted  output  then  becomes  available  for  processing,  and  the  observer  has  the  option  of  choos¬ 
ing  the  processing  technique  that  will  provide  the  best  possible  measurement. 


| 3-22-5964] 


DESI6NED  BY 
OBSERVER 


I _ I 

NOT  UNDER  OBSERVER'S  CONTROL 


DESIGNED  BY 
OBSERVER 


Fig.  1.  Generalized  measurement  equipment. 


In  the  cases  of  principal  interest,  the  output  depends  on  several  parameters  simultaneously. 
Assigning  estimated  values  to  these  parameters  (under  a  maximum  likelihood  criterion)  involves 
finding  the  set  of  parameter  values  which  maximizes  the  probability  of  the  output,  conditioned 
on  these  values.  Since  several  parameters  determine  the  output,  one  must  find  the  maximum  of 
a  function  of  several  variables.  This  search  is  known  as  a  multidimensional  ’’hill  climb.”  Since 
the  sequential  decoding  algorithms  used  for  decoding  tree-encoded  messages  perform  such  a 
hill  climb  in  an  efficient  manner,  the  possibility  of  using  an  analogous  procedure  here  suggests 
itself. 

In  the  remainder  of  this  report,  we  restrict  our  attention  to  additive  noise,  since  it  is  the 
type  most  frequently  encountered  in  measurement  problems.  On  the  basis  of  this  assumption, 
we  adopt  the  following  terminology  as  illustrated  in  Fig.  1.  Let  s  be  a  vector^  with  enough  com¬ 
ponents  to  represent  the  probe  signal;  let  h  be  a  similar  vector  describing  the  unknown  param¬ 
eters;  let  z  be  the  output  of  the  system  being  measured  when  the  probe  signal  s  is  applied;  and 
let  y  be  the  output  available  to  the  observer  as  a  noisy  version  of  zT.  If  n  is  a  vector  describing 
the  noise,  the  additive  noise  assumption  implies 

y  =  z  +  n 

C.  Communication  Problem 

Since  the  motivation  for  the  application  of  a  sequential  algorithm  to  measurements  arose 
from  certain  similarities  between  measurement  problems  and  communication  problems,  we  shall 


t  In  its  most  general  sense,  a  vector  can  be  regarded  as  an  ordered  set  of  quantities.  Thus  a  vector  of  sample 
values  can  be  used  to  represent  a  time  signal  and  a  vector  of  arbitrary  numbers  can  be  used  to  represent  a  set  of 
parameters. 


2 


discuss  the  communication  problem  briefly.  The  general  communication  system  is  shown  in 
Fig.  2.  A  message  source  is  generating  messages  that  must  be  transmitted  to  a  user  over  a 
noisy  channel.  Because  of  the  noise,  the  transmitted  signal  does  not  arrive  at  the  receiver  ex¬ 
actly  as  transmitted,  but  is  corrupted  by  an  unwanted  effect  imposed  upon  it  by  the  channel.  Thus 
errors  are  made  in  conveying  the  source  message  to  the  user. 


Fig.  2.  Generalized  communication  system. 
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It  is  the  communication  engineer’s  job  to  design  the  encoder  in  such  a  way  that  the  error 

2 

probability  is  as  low  as  possible.  Shannon,  in  his  classic  paper,  considered  this  problem  and 
introduced  a  measure  of  the  information  content  of  a  message.  By  using  this  measure,  he  de¬ 
fined  a  rate  of  transmission  in  bits  per  second,  and  proved  that,  by  proper  encoding,  communi¬ 
cation  over  a  noisy  channel  with  as  low  a  probability  of  error  as  desired  is  possible,  provided 
that  the  rate  of  transmission  does  not  exceed  a  fixed  quantity,  the  channel  capacity,  which  is 
determined  by  the  noise  characteristics  of  the  channel.  The  proof  was  purely  one  of  existence, 
and  did  not  show  explicitly  how  to  construct  good  codes. 

Since  Shannon's  paper  appeared,  information  theorists  have  concerned  themselves  with  the 

3 

search  for  coding  techniques  that  permit  communication  with  low  error  probability,  and  are 
also  relatively  simple  to  implement.  The  first  codes  investigated  were  called  block  codes  and 
were  designed  to  use  on  a  binary  channel.  In  these  codes,  a  sequence  of  nR  binary  information 
symbols  is  encoded  into  a  block  of  n  binary  symbols  to  be  transmitted  over  the  channel.  Here 
R,  the  transmission  rate,  is  the  ratio  of  the  number  of  information  bits  to  the  total  number  of 
transmitted  bits.  Shannon  proved  that  there  are  block  codes  which  yield  an  error  probability  that 
decreases  exponentially  with  n,  the  block  length.  The  rate  at  which  this  exponential  decrease 
takes  place  indicates  the  quality  of  the  code. 

Although  it  was  possible  to  prove  the  existence  of  good  block  codes  from  ensemble  average 
arguments,  it  was  difficult  to  find  codes  which  had  sufficient  mathematical  structure  so  that  they 
could  be  encoded  and  decoded  easily.  Much  of  the  difficulty  arose  from  the  fact  that  the  block 

length  n  must  be  quite  large  to  insure  that  the  error  probability  be  low.  Thus  the  number  of 
nR 

code  words  2  must  also  be  large  for  the  communication  to  continue  at  a  reasonable  rate.  Typ- 
30 

ically,  10  code  words  might  be  used.  Ideally,  we  could  compare  the  received  sequence  of 
symbols  with  the  transmitted  sequence  for  each  of  these  code  words  and,  by  some  measure  of 
distance,  ascertain  which  code  word  is  closest  to  the  received  word.  However,  the  large  number 
of  comparisons  makes  this  procedure  undesirable,  particularly  since  this  number  grows  expo¬ 
nentially  with  block  length.  Those  long  codes  which,  because  of  mathematical  structure,  are 
simply  decoded,  suffer  from  a  significantly  higher  error  probability  than  theory  shows  can  be 
obtained. 

4 

Several  years  ago,  Wozencraft  proposed  a  sequential  decoding  procedure  for  decoding 
binary  convolutionally  encoded  messages.  As  long  as  the  rate  did  not  exceed  a  particular  quan¬ 
tity  Rcomp  which  is  strictly  less  than  the  channel  capacity,  Wozencraft  showed  that  the  average 
number  of  computations  needed  to  discard  an  incorrect  symbol  grew  slowly  with  the  constraint 
length  (analogous  to  block  length).  In  addition,  under  the  same  rate  restriction,  this  overall 

encoding-decoding  system  gave  the  same  error  exponent  as  that  for  random  block  codes.  Later 

5  1 

this  technique  was  generalized  by  Reiffen  for  nonbinary  alphabets.  Recently,  Fano  suggested  an 
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alternate  algorithm  to  sequentially  decode  tree-encoded  messages.  This  method  could  be  analyzed 
more  completely  than  that  of  Wozencraft,  and  was  shown  to  require  an  average  number  of  compu¬ 
tations  per  digit  that  is  independent  of  constraint  length.6  These  sequential  decoding  techniques 
will  be  described  in  more  detail  later. 

D.  Measurements  Vs  Communications 

If  the  measurement  problem  discussed  in  Sec.  I-B.  is  compared  with  the  communication 
problem,  some  striking  similarities  appear.  In  both  problems,  a  known  vector  quantity  ^  reacts 
with  an  unknown  vector  quantity  to  produce  a  noise-free  data  vector.  In  both  cases,  there  is  a 
noise  effect  which  prevents  the  user  from  observing  the  data  vector  directly  and  thereby  deter¬ 
mining  uniquely  and  at  once  the  values  of  the  set  of  unknowns.  In  both  instances,  he  can  perform 
an  exhaustive  search  to  find  the  best  estimate  for  these  quantities;  however,  as  previously  dis¬ 
cussed,  this  technique  is  unattractive.  The  only  real  difference  lies  in  the  form  of  the  reaction 
between  the  known  and  the  unknown  vectors. 

The  transformation  from  the  message  symbols  to  the  transmitted  symbols  carried  out  in  the 
encoder  for  communications,  and  the  transformation  from  the  probe  signal  to  the  noise-free  data 
vector  in  the  measurement  problem,  may  both  be  represented  by  the  general  transformation 
T(s,h).  In  the  representation  of  the  communications  encoder,  let  s  be  the  vector  of  encoding 
parameters  and  h  the  sequence  of  message  symbols;  in  the  representation  of  the  system  under¬ 
going  measurement,  let  if  represent  the  probe  signal  and  h  the  unknown  parameters.  Then  in 
both  communications  and  measurements,  s  and  T  are  known  to  the  user  and  it  is  his  task  to 
determine  h.  Thus  an  additional  similarity  exists  between  the  measurement  and  the  communi¬ 
cation  problems. 

However,  it  is  at  this  point  that  a  subtle  difference  arises.  For  in  communications,  the 
choice  of  T  is  at  the  disposal  of  the  user,  whereas  in  measurements,  T,  although  known,  is 
specified  by  the  form  of  the  system  being  measured.  Thus  the  particular  communication  prob¬ 
lem  analogous  to  the  general  measurement  problem  is  the  study  of  a  particular  encoding  tech¬ 
nique  where  the  objective  of  the  study  is  to  develop  an  efficient  decoding  procedure  and  to  as¬ 
certain  how  well  this  procedure  will  operate. 

Despite  this  difference,  it  is  clear  that  the  number  of  similarities  is  sufficiently  large  to 
suggest  that  an  efficient  communication  technique  might  apply  to  measurement  problems  as  well. 
More  specifically,  we  have  indicated  above  that  the  sequential  decoding  technique  has  permitted 
the  multidimensional  search,  required  to  decode  tree-encoded  messages  in  communications,  to 
be  completed  with  a  reasonable  number  of  computations.  We  have  also  indicated  that  a  similar 
multidimensional  search  occurs  in  interpreting  measurement  data.  Thus  the  possibility  of  using 
a  sequential  method  in  measurement  problems  arises. 

E.  Objectives 

In  this  report,  we  investigate  the  possibility  of  using  a  sequential  processing  method  for 
measurements.  First,  we  discuss  the  class  of  measurement  problems  which  appear  amenable 
to  the  application  of  a  sequential  method.  In  this  connection,  we  shall  discuss  measures  by 
which  we  can  compare  hypothesized  noise-free  output  sequences  (zT  in  Fig.  1)  with  actual  data 
vectors  (y  in  Fig.  1);  we  shall  define  a  tree  structure  which  is  required  for  the  sequential  method 


t  Again  we  refer  to  a  vector  in  its  most  general  sense. 


to  apply  to  a  measurement  problem;  we  shall  suggest  a  further  requirement,  called  the  differ¬ 
ential  bias  assumption,  that  guarantees  the  usefulness  of  the  sequential  method;  and  we  shall 

introduce  examples  which  seem  to  satisfy  the  above  two  requirements. 

4  1 

After  describing  the  methods  suggested  by  Wozencraft  and  Fano  for  sequential  decoding, 
we  analyze  the  Fano  technique  in  detail.  We  show  that  the  average  number  of  computations  to 
decode  one  branch  of  the  tree  is  bounded  by  a  constant.  We  also  demonstrate  that  the  proba¬ 
bility  of  incorrectly  estimating  a  parameter  decreases  exponentially  with  the  number  of  available 
output  samples  dependent  upon  that  parameter.  For  the  case  of  white,  Gaussian  noise,  graphs 
will  be  presented  which  show  how  the  decoder's  operation  depends  on  the  various  quantities  which 
are  used  to  describe  the  decoder  and  on  the  noise  level.  It  will  become  apparent  that  when  we 
try  to  differentiate  between  parameter  values  that  produce  too  small  an  effect  on  the  output,  rela¬ 
tive  to  the  noise,  the  sequential  method  will  fail.  Thus  there  is  a  parameter  analogous  to  ^comp» 
the  rate  above  which  the  sequential  method  fails  in  communications. 

Finally,  the  results  of  a  simulation  of  the  sequential  method  used  on  a  particular  simplified 
measurement  problem  will  be  presented.  It  will  be  seen  that  the  simulated  behavior  is  very 
similar  to  the  calculated  behavior,  thereby  lending  support  to  the  assumptions  made  in  analyzing 
the  sequential  method  as  applied  to  the  measurement  problem.  The  simulation  results  are  for 
a  model  of  the  geophysical  exploration  problem,  and  a  clearer  understanding  of  the  difficulties 
inherent  in  this  problem  came  about  through  the  simulation.  Some  thoughts  in  this  area,  partic¬ 
ularly  in  connection  with  quantizing  the  unknown  parameters,  will  be  presented.  Finally,  some 
suggestions  for  future  research  will  be  made. 

II.  APPLICATION  OF  SEQUENTIAL  METHOD  TO  MEASUREMENTS 

A.  Introduction 

In  this  section,  we  consider  specifically  the  application  of  a  sequential  method  to  measure¬ 
ments.  First,  we  discuss  metrics  which  must  be  used  to  define  precisely  the  fit  of  a  hypothesis 
to  the  data.  Then  we  set  forth  the  two  requirements  sufficient  to  prove  that  the  Fano  algorithm 
will  be  applicable.  Next,  we  consider  two  examples  toward  which  the  sequential  method  may  be 
applied.  Finally,  we  describe  the  Wozencraft  and  Fano  algorithms. 

B.  Metrics 

In  Sec.  I-B,  we  considered  estimating  a  set  of  parameters  h,  by  comparing  the  output  vector 
zT,  resulting  from  a  particular  h*  vector,  to  the  received  y  vector.  To  carry  out  an  algorithm, 
this  notion  must  be  made  precise.  We  consequently  define  a  quantity,  hereafter  denoted  a 
metric,^  which  specifies  the  degree  to  which  a  fit  is  made. 

Before  specifying  the  particular  metric  that  will  be  considered  in  this  report,  we  recall  the 
difference  between  maximum  likelihood  and  maximum  a  posteriori  estimation.  Suppose  there  is 
a  set  of  alternatives  {a.},  each  occurring  with  the  a  priori  probability  p(a.).  We  are  trying  to 
choose  which  alternative  produced  the  datum  d.  First,  we  could  calculate  the  probability  of 
each  alternative,  conditional  on  the  datum  p(a./d),  and  choose  as  the  estimate  that  alternative 
which  maximized  this  function.  This  is  referred  to  as  maximum  a  posteriori  estimation,  since 
p(a./d)  is  the  a  posteriori  probability  of  the  alternatives.  We  note  that  the  calculation  is  made 

i 

from  Bayes  rule, 

t  The  term  metric  is  convenient  but  not  strictly  proper,  since  we  do  not  require  these  metrics  to  have  the  mathe¬ 
matical  properties  of  reflexivity,  symmetry,  and  triangle  inequality  satisfaction. 
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Ptejld)  = 


p(d | a.)  p(a.) 


2  p(d  I  a.)  p(a.) 

i 


Thus  the  a  priori  probabilities  are  used  to  carry  out  the  a  posteriori  estimation  method. 

Generally,  however,  the  a  priori  probabilities  are  not  known  explicitly.  We  must  then  take 
care  not  to  introduce  bias  into  the  metric  by  the  use  of  uncertain  values  for  the  a  priori  prob¬ 
abilities.  The  maximum  likelihood  approach  should  therefore  be  considered. 

A  maximum  likelihood  estimate  is  that  value  of  the  unknown  parameter  which  maximizes 
the  probability  p(d/a.)  of  the  datum,  conditional  on  the  parameter  value.  The  maximum  likeli¬ 
hood  method  has  the  benefit  of  being  independent  of  the  a  priori  knowledge,  and  thus  is  more 
convenient  to  implement.  It  is  important  to  note  that  the  maximum  likelihood  method  is  equiva¬ 
lent  to  the  a  posteriori  probability  method  if  the  a  priori  probabilities  are  equal. 

Discussions  of  the  appropriateness  of  each  technique  are  common  in  the  statistical  litera- 
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ture  and  it  could  serve  little  purpose  to  continue  them  here.  Suffice  it  to  say,  however,  that 
if  one’s  ability  to  perform  a  measurement  depended  critically  on  the  a  priori  probabilities,  then 
one  would  have  little  confidence  in  the  result. 

Because  we  seldom  have  reliable  a  priori  information  available  in  a  measurement  problem, 
and  for  the  other  reasons  cited  above,  we  restrict  ourselves  in  this  report  to  a  maximum  likeli¬ 
hood  approach.  Consequently,  the  decoding  metric  should  be  a  monotone  function  of  p_^(y/z  ), 
the  probability  density  function  of  the  noise  vector. ^  It  is  also  desirable  to  define  the  metric  in 
such  a  way  that  independent  contributions  to  the  total  are  additive.  A  metric  with  these  proper¬ 
ties  is  proportional  to  log p_^(y /zT).  If  the  noise  samples  are  indeed  independent  and  identically 

n 

distributed,  this  becomes 


11  j 

On  correct  paths,  the  expected  value  of  this  metric  is 


where  H(N)  is  the  entropy  of  the  noise  vector. 


We  shall  see  in  the  discussion  of  the  Fano  algorithm  that  the  metric  should  increase  on 
correct  paths,  while  it  should  decrease  on  all  others.  Therefore,  the  metric  for  that  algorithm 
will  be  chosen  to  be 


k 


Mk  =  E  R  +  lnPn(yjlz-j) 

j=i 


k 


=  kR+  E  dj 
j=l 


t The  subscript  n  specifies  the  noise  probability  density  function. 
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where  cL  is  the  incremental  contribution  to  the  metric  due  to  the  noise  and  R  is  a  constant  bias 
to  be  chosen  later.  If  R  exceeds  the  noise  entropy,  this  metric  will,  on  the  average,  be  com¬ 
posed  of  positive  increments  on  the  incorrect  path.  If  R  is  not  chosen  too  large,  and  if  the  noise 
is  not  too  great,  it  will  be  shown  that  the  metric  will,  on  the  average,  decrease  on  all  incorrect 
paths. 

C.  Tree  Structures 

In  the  coupled  parameter  measurement  problem,  the  observer  has  available  the  noisy  data 
vector  y  and  the  probe  signal  s  as  well  as  some  qualitative  information  about  their  relationship. 
This  qualitative  description  is  to  be  made  explicit  through  the  estimation  of  the  unknown  param¬ 
eters  designated  by  h. 

A  general  estimation  procedure  for  this  complex  problem  might  consist  of  guessing  values 
for  all  N  components  of  h  and  comparing  the  resultant  z  vector  with  the  received  data  vector 
y.  Then  by  varying  the  h  components  until  all  possible  vectors  are  tested,  the  observer  can 
choose  the  best  fit  to  the  data  vector  y.  As  mentioned  in  the  introduction,  this  would  require  an 
unrealistic  number  of  attempts  for  any  sizable  number  of  h  components. 

Occasionally,  it  may  be  possible  to  find  the  best  fit  by  guessing  an  h  vector  and  then  ad¬ 
justing  the  guess,  a  component  at  a  time,  until  the  fit  cannot  be  improved.  However,  this  proce¬ 
dure  has  the  pitfall  of  local  maxima  at  which  a  poor  fit  gets  poorer,  no  matter  how  the  h  com¬ 
ponents  are  individually  varied.  Another  difficulty  arising  with  this  method  is  the  so-called 
"plateau11  problem  whereby,  for  most  guesses,  the  adjustment  of  any  h  parameter  gives  a  neg¬ 
ligible  change  in  the  fit. 

In  the  class  of  problems  to  which  the  sequential  algorithm  applies,  there  is  a  structure  known 
as  a  tree  structure  which  permits  these  problems  to  be  circumvented  and  is  defined  as  follows. 

—  n  — 

Suppose  that  each  h  is  quantized  to  D  levels  so  that  there  are  D  possible  h  vectors.  Also 
suppose  the  components  of  ?  and  h  can  be  ordered  so  that 

zi  =  fi(hi'?) 

z2  =  f2(hi*h2'®) 


z.  =  fi(h1,h2,  .  .  .,h.,  s) 

Then  a  tree  can  be  constructed  having  nodes  which  represent  the  set  of  all  h  vectors  having  a 
common  initial  part.  In  this  tree,  a  node  at  depth  i  represents  all  h  vectors  identical  in  the 
first  i  components.  Since  z.  is  dependent  only  on  the  first  i  h  components,  a  one-to-one  cor¬ 
respondence  exists  between  the  D  nodes  at  depth  i  and  the  D  sets  of  h  vectors  where  each  set 
N-i 

consists  of  the  D  vectors  with  a  common  prefix. 

Once  this  tree  structure  is  assumed,  it  becomes  possible  to  perform  the  hill  climb  on  an 
incremental  basis.  That  is,  one  can  estimate  h^  on  the  basis  of  y^,  and  then,  conditional  on 
this  value  for  h^,  consider  h^  using  y^  for  comparison  as  well  as  y^,  etc.  If  the  estimates  are 
correct,  these  comparisons  will  continue  to  be  satisfactory.  However,  if  an  error  occurs  at 
one  stage  due  to  a  large  noise  sample,  and  if  the  effect  of  this  incorrect  hypothesis  is  to  make 
the  succeeding  hypothesized  z  components  different  from  the  true  z  components,  the  error  will 
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become  apparent  at  later  stages.  When  such  evidence  appears,  the  estimation  of  additional 
parameters  should  be  halted,  and  the  processor  should  concentrate  instead  on  correcting  the  er¬ 
ror.  The  sequential  decoding  algorithms  are  formalized  procedures  for  making  and  correcting 
these  estimates  and  will  be  discussed  later  in  this  section.  First,  however,  we  present  examples 
of  practical  measurement  interest  which  possess  the  tree  structure  defined  above. 

D.  Example  I.  Impulse  Response  of  Discrete  Linear  Filter 

As  a  relatively  simple  example  illustrating  the  use  of  a  sequential  measurement  procedure, 
we  consider  a  linear,  time-invariant,  time-  and  amplitude-discrete  filter.  Because  of  the  linear 
aspect  of  this  problem,  linear  regression  techniques  can  be  used  to  estimate  the  components  of 
the  filter  in  a  much  less  complex  manner  than  the  sequential  one.  However,  the  linear  filter  is 
simple  and  familiar  enough  to  be  described  easily.  For  completeness,  the  linear  regression 
technique  is  briefly  discussed  in  Appendix  B. 

It  is  assumed  that  the  amplitude  of  the  filter  impulse  response  is  quantized  to  one  bit  (two 
levels)  and  that  a  necessary  and  sufficient  description  of  the  filter  is  given  by  its  response  to  an 
input  pulse  of  unit  amplitude.  In  addition,  the  input  signal  amplitude  is  also  quantized  to  one  bit 
and  is  time  discrete  in  synchronism  with  the  filter  response.  Gaussian  noise  samples  are  added 
to  the  filter  output  and  the  result  is  transmitted  to  the  user,  whose  task  is  to  determine  the  filter 
response  given  the  input  signal  and  the  noisy  output. 

Part  of  the  user's  problem  is  to  determine  a  satisfactory  or  perhaps  even  optimum  (in  some 
sense)  input  signal  subject  to  some  total  energy  constraint.  Of  course,  the  most  obvious  input 
is  a  sequence  of  unit  pulses  spaced  sufficiently  far  apart  to  guarantee  that  the  filter  response  has 
ended  before  a  second  response  due  to  a  second  input  pulse  has  begun.  With  such  an  input,  since 
the  symbols  are  independently  disturbed  by  the  noise,  the  only  reasonable  strategy  is  to  deter¬ 
mine  the  filter  response  components  independently  on  the  basis  of  the  output  components  influenced 
by  them.  No  sequential  procedure  suggests  itself  here  and  indeed  none  can  logically  be  proposed, 
since  there  is  no  output  component  influenced  by  more  than  one  component  of  the  filter  response. 

However,  because  he  may  want  to  put  energy  into  the  filter  more  rapidly  than  this  procedure 
allows  under  a  peak-power  constraint,  the  user  may  prefer  to  use  a  more  complex  input  of  shorter 
total  duration  than  is  permitted,  if  outputs  are  not  to  overlap.  In  this  instance,  a  natural  se¬ 
quential  procedure  occurs  and  it  is  this  procedure  which  will  be  discussed  in  the  remainder  of 
this  section. 

The  system  under  consideration  consists  of  an  input  signal  s’,  a  filter  response  h,  an  un¬ 
disturbed  filter  output  z,  a  noise  sequence  n,  and  a  system  output  y.  The  components  of  s’  and 
h  take  on  the  values  (+1)  and  (—1),  the  components  of  z  take  on  integral  values,  and  n  and  y 
take  on  values  in  the  continuum.  For  simplicity,  we  assume  that  the  duration  of  h  is  known  to 
be  M  units  and  that  of  s  is  N  units. 

Before  describing  the  sequential  procedure,  the  ideal  measurement  technique  will  be  dis¬ 
cussed.  The  undisturbed  output  z  is  an  M  +  N  -  1  component  time-discrete  signal  and  can  there¬ 
fore  be  plotted  as  a  vector  in  an  M  +  N  —  1  dimensional  vector  space.  The  noisy  output  y  can 
also  be  plotted  in  this  same  space  and,  if  the  noise  level  is  not  very  high,  will  be  a  point  not  far 
from  S’.  Now  it  is  the  user's  task  to  determine  from  y  which  of  the  21  possible  vectors  is  the 
actual  filter  response.  Since  the  input  signal  iT  is  known,  the  user  could  theoretically  compute 
the  2^  z  vectors  corresponding  to  the  2^  possible  h  vectors  by  convolving  the  known  if  with 
each  one  of  them.  Then  the  maximum  likelihood  filter  response  is  that  corresponding  to  the  S’ 
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closest  to  the  output  signal  y*  The  effect  of  choosing  s*  is  to  move  the  2M  z  vectors  in  the  output 

space;  optimally,  s*  should  be  choosen  to  minimize  the  probability  of  confusion  between  them. 

Practically,  however,  this  method  of  measurement  is  not  feasible,  since  the  number  of  compu- 
N 

tations  2  grow  exponentially  with  the  response  duration. 

We  immediately  note  the  similarity  between  this  ideal  procedure  and  that  existing  for  the 
decoding  of  convolutionally  encoded  messages.  In  that  case,  too,  the  ideal  method  is  imprac¬ 
tical  because  of  the  exponential  growth  in  the  number  of  computations  with  constraint  length. 

The  sequential  decoding  procedure  is  designed  to  avoid  this  exponential  growth  and  it  would  not 
seem  surprising  that  it  could  be  applied  to  obtain  the  same  advantage  in  this  measurement 
problem. 

The  key  to  the  operation  of  a  sequential  procedure  is  the  so-called  tree  structure.  In  the 
measurement  problem  the  structure  arises  as  follows.  The  input-output  relationship  for  the 
filter  is  given  by  the  well-known  convolution  integral  (summation  is  due  to  the  synchronous  time- 
discrete  input  and  filter  response). 

j 

z .  =  /  ,  h.s  .  . 

J  u  ij-i 

i=0 


The  indexing  convention  implies  that  only  positive  indices  are  meaningful, 
may  write  the  first  few  equations  as 


z  =  h  s 
o  o  o 

z  =  h  s,  +  h.s 

1  o  1  1  o 

z0  =  h  s0  +  h.s,  +  h0s 

2  o  2  11  2  o 


etc. 


Therefore,  we 


Consequently,  the  two  hypotheses  for  hQ  lead  to  two  hypotheses  for  zQ.  Given  each  hypothesis 
for  hQ,  the  two  hypotheses  for  h^  lead  to  two  hypotheses  for  z^,  etc.  The  tree  is  therefore  con¬ 
structed  by  considering  each  path  through  the  tree  as  a  separate  filter  response  and  calculating 
for  each  branch  the  undisturbed  filter  output  that  would  occur  for  the  corresponding  filter  re¬ 
sponse.  This  is  illustrated  in  Fig.  3. 

After  M  postulates  have  been  made,  the  entire  filter  response  is  determined.  However, 

N  —  1  components  of  z  have  not  been  compared  with  the  corresponding  components  of  y.  Although 
no  choice  remains,  these  components  do  contain  information  about  the  filter  response  compo¬ 
nents;  therefore,  they  should  be  used  in  the  measurement  procedure.  Consequently,  there  will 
be  Nz*  components  corresponding  to  the  last  branch  of  the  tree.  We  shall  call  this  set  of  com¬ 
ponents  the  remainder  set. 

In  the  next  section,  we  discuss  a  problem  toward  which  the  sequential  procedure  might 
realistically  be  applied. 


E.  Example  II.  Reflection  Study  of  Geophysical  Layers 

In  the  simplified  linear  filter  problem  discussed  in  the  preceding  section,  the  applicability 
of  the  sequential  measurement  technique  came  about  through  the  dispersive  nature  of  the  filter. 
The  first  M  successive  output  pulses  each  depend  on  a  filter  response  component  that  had  not 
affected  the  previous  output  pulses.  Thus  a  tree  structure  arose  and  the  sequential  procedure 
became  feasible.  However,  because  the  outputs  are  linear  functions  of  the  unknown  parameter, 
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Fig.  3.  Tree  structure  for  a  3-component  filter 
and  a  specific  2-component  input. 


Fig.  4.  Reflections  from  layered  structures. 

Note:  The  pulses  are  labeled  in  accordance 
with  the  path  they  followed. 
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the  sequential  method  is  inferior  to  a  linear  regression  technique  which  is  much  less  complex 
to  instrument. 

Of  more  practical  interest  is  a  problem  in  which  the  outputs  are  not  linear  fuctions  of  the 
unknown  parameters,  and  we  choose  the  geophysical  exploration  problem  as  an  example  for  this 
discussion.  Other  examples  might  include  radar  investigation  of  targets  with  range  extent  and 
telephone  line  measurements  by  pulsed  inputs.  The  geophysical  problem  was  chosen  partly  be¬ 
cause  of  the  readiness  with  which  the  sequential  technique  could  be  adopted.  However,  it  appears 
that  all  the  information  available  to  the  observer  is  not  utilized  in  geophysical  work  because  of 
a  lack  of  suitable  data-processing  techniques. 

For  about  fifty  years,  artificially  generated  seismic  waves  have  been  used  in  the  investiga¬ 
tion  of  layered  structures  beneath  the  earth’s  surface.  Although  initially  refraction  studies  were 
carried  out  exclusively,  improvements  in  technique  since  World  War  II  have  brought  about  a 
broad  changeover  to  reflection  methods.  Indeed,  in  many  areas  of  geological  exploration,  such 
as  in  petroleum  prospecting,  the  change  is  almost  complete. 

Generally  speaking,  the  earth’s  structure  is  one  of  multiple  layers  of  varying  materials  and 
of  varying  thicknesses.  A  seismic  wave,  initiated  by  the  detonation  of  several  pounds  of  explo¬ 
sive,  travels  downward  through  the  earth’s  crust  and  is  reflected,  in  part,  at  each  boundary. 

Since  the  initial  blast  is  pulse-like,  pulses  from  the  succeeding  layers  will  arrive  at  the  surface 
at  later  times  which  depend  specifically  on  the  propagating  media,  the  location  of  the  layers,  and 
the  location  of  the  observation  point.  This  is  illustrated  in  Fig.  4(a-b).  By  observing  the  arrival 
times  and  amplitudes  of  these  pulses,  it  is  possible  to  deduce  the  layered  structure  of  the 
subterrain. 

The  seismic  waves  propagate  through  the  layers  in  a  manner  governed  by  the  wave  equation 
for  an  acoustic  wave  in  an  elastic  medium.  These  waves  travel  with  a  velocity  that  depends  on 
the  medium,  and  at  a  boundary  they  are  partially  reflected  and  partially  transmitted.  It  does 
not  seem  appropriate  to  discuss  the  pertinent  equations  in  great  detail,  since  there  are  many 

g 

formal  presentations  available.  We  may  say,  however,  that  the  equations  and  their  solution  are 
perfectly  analogous  to  those  obtained  in  the  study  of  electromagnetic  plane  waves  traveling  through 
dielectric  media. 

In  particular,  we  can  define  a  characteristic  impedance  of  a  medium  Zq,  which  is  related  to 
the  velocity  of  propagation  v  and  the  medium’s  density  p  by 

Z  =  pv 
o 

If  a  pulse  of  amplitude  A  propagating  in  a  medium  with  characteristic  impedance  ZQ1  strikes 
perpendicularly  to  the  boundary  of  a  second  medium  with  characteristic  impedance  ZQ2,  there 
will  be  a  reflected  pulse  of  amplitude 

Z02  ~  Z01  . 

Z„,  +  Z„  '  A 


and  a  transmitted  pulse  of  amplitude 


2Z 


02 


Z02  +  Z01 


•  A 


On  the  basis  of  these  amplitudes,  it  is  possible  to  calculate  the  entire  response  of  a  given  struc¬ 
ture  to  an  initial  wave  in  terms  of  its  amplitude  and  the  various  acoustic  impedances.  Note  that 
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multiple  reflections  may  simultaneously  arrive  at  the  observer  and  these  must  be  accounted  for 
in  the  calculation. 

In  the  geophysical  problem,  however,  the  acoustic  impedances  are  the  objective  of  the  meas¬ 
urement.  At  some  time  after  the  blast,  the  observed  signal  will  be  a  very  complex  function  of 
the  many  geophysical  parameters.  However,  we  shall  soon  see  that  there  is  a  tree  structure  that 
simplifies  the  processing  and  makes  a  sequential  technique  the  natural  one. 

Note,  first,  that  the  first  response  to  the  observer  is  a  reflection  from  the  first  boundary 
and  that  its  time  of  arrival  indicates  the  thickness  of  the  first  layer  while  the  amplitude,  rela¬ 
tive  to  the  amplitude  of  the  initial  disturbance,  permits  the  acoustic  impedance  of  the  second 
layer  (assuming  that  of  the  first  is  known)  to  be  determined.  The  next  response  is  from  the 
second  boundary  and  gives  information  of  the  second  layer’s  thickness  and  the  third's  impedance. 
Thus  the  layers  may  be  considered  sequentially  and,  as  the  measurement  process  continues,  the 
effects  of  earlier  layers  may  be  removed  from  later  data  points. 

From  the  above  description  of  the  seismic  reflection  problem,  we  can  abstract  a  simplified 
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model  which  was  simulated  as  a  basis  for  testing  the  sequential  measurement  technique.  Con¬ 
sider  a  transmission  line  of  L  sections  each  of  the  same  length.  Let  the  impedance  of  each  sec¬ 
tion  be  one  of  the  two  quantities  Z^  or  Zg.  Let  the  reflected  output  of  the  line  be  available  to  the 
observer  disturbed  by  Gaussian  white  noise  of  variance  a2.  Then  the  observer's  objective  is  to 
determine  the  {ZQn}.  In  doing  so,  he  may  choose  any  input  that  best  satisfies  his  objective. 

Before  proceeding  to  a  more  detailed  description  of  sequential  decoding,  a  few  more  remarks 
relative  to  the  geophysical  exploration  problem  are  in  order.  When  studying  the  data  processing 
methods  in  this  area,  one  is  struck  by  the  dearth  of  precise  techniques.  Indeed,  long-term  am¬ 
plitude  information  is  being  generally  discarded  in  favor  of  automatic  volume  control  which  per¬ 
mits  a  constant  amplitude  on  the  seismograph  record  without  a  need  to  calibrate.  The  chief 
argument  for  this  approach  has  been  that  the  amplitude  of  the  test  pulse  generated  by  the  blast 
is  too  variable.  Only  recently  has  the  usefulness  of  the  amplitude  ratios  been  noted.10  In  addi¬ 
tion,  the  majority  of  the  seismographic  data  gathered  in  search  of  petroleum  has  been  reduced 
by  eye.  Consequently,  the  skill  of  the  reducer  is  of  prime  importance  and  any  oversight  by  him 
could  result  in  the  waste  of  an  expensive  seismic  survey. 

Thus  there  is  a  strong  need  for  automatic,  precise  data  reduction  techniques.  Perhaps  the 
sequential  measurement  technique  will  provide  the  basis  for  a  practical,  efficient  method  to 
process  data  from  the  seismic  exploration  of  layered  geophysical  structures. 

F.  Sequential  Decoding  (According  to  Wozencraft) 

In  the  preceding  sections,  we  discussed  sequential  algorithms  in  general  and  indicated  some 
typical  problems  to  which  they  may  apply.  We  next  describe  in  detail  the  two  procedures  which 
have  received  the  most  attention.  Although  the  bulk  of  this  work  will  be  concerned  with  an  algo¬ 
rithm  similar  to  that  of  Fano,  we  include  for  completeness  a  brief  description  of  the  sequential 

4  5 

decoding  technique  introduced  by  Wozencraft  and  generalized  by  Reiffen. 

The  objective  in  the  measurement  problem  is  to  determine  which  of  the  z*  vectors  is  "closest" 
to  the  y  vector  that  has  been  received.  The  notion  of  closeness  can  be  made  explicit  by  defining 
a  metric  which  is  additive  and  increases  with  the  size  of  the  noise  samples  according  to 

logp^fy/zT)  where  p  (y/?)  is  the  probability  density  function  of  the  noise  vector.  Suppose  first 

n  n 

that  in  terms  of  this  quantity,  one  considers  "radii"  of  constant  metric  around  the  received  vec¬ 
tor  y.  Then  one  may  ask  if  any  of  the  vectors  lies  within  a  radius  r^  of  y.  This  question  could 


12 


be  answered  by  postulating  the  first  j  components  of  h,  computing  the  portion  of  the  z  vector 
determined  by  this  subset  of  h  components,  and  determining  the  portion  of  the  total  metric  cal¬ 
culable  on  the  basis  of  the  partial  hypothesis.  Certainly,  if  the  partial  metric  exceeds  r^, 
the  total  metric  will  also.  We  will  see  later  that  the  average  number  of  computations  is  reduced 
if  r^  is  varied  as  the  depth  into  the  tree  increases.  Therefore,  those  z  vectors  very  distant 

from  the  received  y  will  be  eliminated  from  consideration  before  many  of  the  components  are 

M  -* 

tried.  Since  most  of  the  2  z  vectors  are  very  different  from  y,  the  number  of  computations 
will  be  greatly  reduced  and  it  is  this  reduction  that  permits,  on  the  average,  a  linear  rather  than 
exponential  growth  in  computation  with  N.  If  the  does  not  exceed  r^,  then  another  component 
of  h  is  postulated. 

Suppose  that  none  of  the  z  vectors  are  within  r^  of  y.  In  that  case,  the  procedure  suggests 
repeating  the  procedure  for  r^  >  r^.  Eventually,  the  sphere  will  be  enlarged  sufficiently  to  in¬ 
clude  one  of  the  z  vectors  and  this  one  is  considered  the  undisturbed  filter  output,  and  the  cor¬ 
responding  filter  response  becomes  the  measurement  result.  It  may  happen  that  more  than  one 
z  vector  falls  within  an  increased  value  of  the  radius  and  as  a  result  the  wrong  response  could  be 
determined.  This  event  is  one  of  the  possibilities  for  error  and  it  will  be  assumed  conservatively 
that  whenever  it  does  happen,  an  error  results. 

Clearly,  the  number  of  computations  can  be  decreased,  if  the  radii  considered  above  are 
changed  as  the  procedure  successively  postulates  more  h  components.  Since  it  is  unlikely  that 
a  cumulative  metric  will  increase  very  rapidly  for  small  values  of  j  and  then  very  slowly  for 
larger  values  in  order  that  the  total  metric  remains  below  r^,  a  set  of  criterion  functions  r^(j) 
should  be  used  which  increase  monotonically.  This  reduces  the  number  of  computations  by 
causing  any  short  path  with  rapidly  increasing  cumulative  metric  to  be  dropped  from  further  con¬ 
sideration  before  the  partial  distance  becomes  equal  to  the  maximum  allowable  distance.  Of 

course,  the  correct  path  may  have  a  metric  which  first  increases  rapidly  and  then  much  more 

th 

slowly.  Although  such  a  path  may  be  rejected  under  this  procedure  for  the  k  n  criterion  function, 
r^(j),  it  will  prove  to  be  acceptable  for  some  other  criterion  function  r^,(j),  k’  >  k. 

In  the  analysis  of  this  technique,  the  number  of  computations  for  rejecting  the  incorrect 
branches  at  a  node  have  been  bounded,  but  the  number  for  accepting  the  correct  branch  have  not. 
The  analysis  of  the  Fano  procedure  permits  a  complete  bound  to  the  number  of  computations. 


G.  Sequential  Decoding  (According  to  Fano) 

To  determine  the  z  vector  closest  to  the  received  y  vector,  another  related  procedure, 

\ 

similar  to  that  developed  by  Fano  for  sequential  decoding,  can  also  be  used.  In  this  procedure, 
the  paths  through  the  encoding  tree  are  also  tested  for  cumulative  distance,  but  the  thresholding 
strategy  differs  greatly.  A  metric  is  used  which  tends  to  increase  when  the  decoder  is  on  the 
correct  path  and  decrease  when  the  incorrect  path  is  followed.  With  such  a  measure,  the  anal¬ 
ogous  procedure  is  to  postulate  successive  branches,  to  compute  the  total  measure  and  then  to 
compare  this  with  a  threshold.  If  the  total  measure  crosses  under  the  threshold,  the  branch  is 
considered  unacceptable  and  other  branches  from  the  previous  node  are  tried  until  an  acceptable 
branch  is  found  stemming  from  it. 

If  this  cannot  be  done,  the  procedure  is  to  back  off  another  node  and  to  test  branches  stemming 
from  it  against  a  threshold  that  is  just  satisfactory.  When  the  metric  is  chosen  in  such  a  way 
that  the  variation  on  the  correct  path  will  eventually  put  the  total  measure  on  the  acceptable  side 
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of  the  threshold,  this  search  will  eventually  be  successful.  If  the  total  does  not  cross  the  thresh¬ 
old,  the  threshold  is  adjusted  by  a  multiple  of  a  basic  increment  to  just  keep  the  current  total 
metric  satisfactory.  This  practice  of  following  the  total  metric  as  closely  as  possible  with  the 
threshold  serves  to  minimize  the  number  of  computations  with  a  small  loss  in  error  exponent. 

As  indicated  in  Sec.  II-B,  the  Fano  algorithm  requires  a  metric  that  increases  on  the  correct 
path  and  decreases  on  incorrect  ones.  We  have  seen  that  the  metric  for  n  observation  intervals 

n 

Mn=nR+  E  i 
i=  1 

where  d^  is  the  incremental  contribution  and  R  is  a  constant  bias,  has  the  desired  properties. 

The  decoder  will  consider  branches  stemming  from  a  node  in  order  of  decreasing  metric. 

It  will  record  previous  decisions  by  means  of  a  vector  variable  i(l),  i(2),  .  .  .  ,  i(n)  where  i(n)  is 
the  order  number  of  the  branch  selected  by  the  decoder  at  depth  n  in  the  tree.  Such  a  vector 
description  of  the  decoder  position  requires  the  use  of  the  first  j  vector  components  to  deter¬ 
mine  the  position  at  depth  j. 

The  algorithm  will  best  be  described  in  connection  with  the  flow  chart  of  Fig.  5.  Every  time 
a  branch  of  the  tree  is  tested,  the  decoder  is  situated  at  the  point  marked  ’’start.11  First  the  in¬ 
crement  to  the  metric  corresponding  to  the  branch  under  test  is  computed  and  added  to  the  cumu¬ 
lative  metric  The  quantity  is  then  compared  with  the  current  threshold  T.  If  Mn+^  ^ 

T,  the  branch  is  deemed  satisfactory  to  the  decoder  which  then  follows  loop  A  and  proceeds  to 
test  a  new  branch  beyond  the  one  just  tested.  When  the  successful  branch  is  under  test  for  the 
first  time,  the  threshold  is  raised  until  it  obtains  its  maximum  permissible  value  below  M  +^. 

If  the  branch  has  been  tested  previously,  the  threshold  should  remain  at  the  original  level. 

The  remainder  of  the  flow  chart  deals  with  unsatisfactory  branches.  Since  the  branches 
stemming  from  a  node  are  tested  in  order  of  decreasing  metric,  the  failure  of  one  branch  imme¬ 
diately  implies  the  failure  of  all  branches  at  that  node  for  the  present  threshold.  Therefore,  the 
decoder  must  return  to  a  previous  node  to  seek  a  satisfactory  branch.  Before  testing,  if  branches 


Fig.  5.  Flow  chart  of  the  sequential  decoding  procedure  (A  -*•  B  indicates: 
set  B  equal  to  A). 
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from  that  node  are  satisfactory,  it  is  necessary  to  test  the  cumulative  metric  at  the  node  itself. 

If  <  T,  the  decoder  lowers  the  threshold  by  Tq  and  then  searches  to  see  if  there  is  a  path  re¬ 
maining  above  the  new  threshold  setting.  If  >,  T,  other  less  likely  branches  are  tested  to  see 
if  they  lead  to  paths  remaining  above  T. 

The  decoder  must  take  care  not  to  raise  the  threshold  on  a  path  that  has  already  been  tested. 
The  procedure  operates  by  testing  thresholds  in  order  of  decreasing  value,  and  if  one  proves  un¬ 
satisfactory,  no  higher  threshold  should  be  used  until  virgin  territory  is  reached.  We  see  in 
Fig.  5  that  F  =  0  whenever  a  new  path  is  followed  and  that  F  =  1  whenever  one  is  being  retraced. 

F  is  set  to  one  whenever  a  path  falls  below  a  threshold  T’.  If  the  threshold  is  then  lowered  to 
T'  —  Tq,  the  decoder  will  continue  to  retrace  branches  already  investigated  until  it  finds  one  that 

exceeds  T1  —  T  but  is  below  T1.  This  is  the  first  new  branch  to  be  tested  and  F  is  reset  to  zero, 
o 

If  the  decoder  does  not  lower  the  threshold,  but  instead  backs  up  to  an  earlier  node  with  several 
paths  above  T',  it  will  search  a  new  path  only  if  one  remains  below  T'  +  Tq.  Otherwise,  the  de¬ 
coder  would  have  raised  the  threshold  to  T'  +  T  when  it  reached  this  node  for  the  first  time. 

o 

The  operation  of  the  algorithm  will  be  best  understood  by  the  reader  if  he  follows  its  opera¬ 
tion  in  typical  cases  in  detail.  Figure  6  is  a  sequence  of  display  photographs  resulting  from  the 
simulation  of  the  decoder  operating  on  a  model  of  a  geophysical  exploration  problem  of  the  type 
discussed  in  Sec.  II-E.  These  photographs  illustrate  the  more  important  cases  that  occur  during 
the  decoder’s  operation.  This  display  follows  the  acceptance  of  a  choice  in  loop  A  of  the  decoder 
before  the  threshold  is  raised  for  this  newly  accepted  branch. 

H.  Differential  Bias  Assumption 

In  Sec.  I-D,  we  noted  a  basic  difference  in  the  freedom  available  in  communications  for  in¬ 
troducing  redundancy  and  that  available  in  measurements.  In  communications,  there  is  the 
freedom  to  design  the  encoder  in  a  way  that  will  make  the  set  of  possible  transmitted  sequences 
as  different  among  themselves  as  possible.  Once  such  an  encoder  is  chosen,  certain  parameters 
are  chosen  to  optimize  the  encoder’s  performance.  The  analysis  of  this  performance  is  usually 
based  on  the  average  behavior  over  the  ensemble  of  parameter  values.  We  are  thus  guaranteed 
that  there  is  at  least  one  set  of  parameters  which  would  provide  this  average  behavior. 

In  measurements,  however,  this  possibility  does  not  exist.  Although  we  have  the  freedom 
to  choose  the  probe  signal,  the  set  of  possible  transmitted  sequences  is  highly  constrained  by 
the  device  being  measured.  Consequently,  it  could  well  happen  that  the  various  hypothesized 
parameter  vectors  produce  almost  identical  sequences  of  noise-free  outputs,  no  matter  how  the 
probe  signal  is  chosen.  In  such  a  case,  measurements  that  would  distinguish  among  the  vectors 
would  be  difficult. 

The  notion  of  coding  in  communications  is  different  from  that  in  measurements.  In  both 
areas,  coding  is  essential,  since  there  must  be  some  redundancy  in  the  noise-free  data  to  indi¬ 
cate  to  the  decoder  when  it  has  erred.  Unless  an  incorrect  hypothesis  at  some  point  leads  the 
decoder  to  a  node  in  the  tree  at  which  every  hypothesized  output  differs  from  the  correct  output 
for  that  tree  depth,  the  decoder  will  never  be  able  to  ascertain  its  error.  Otherwise  there  would 
always  be  some  incorrect  path  through  the  tree  identical  in  its  output  sequence  to  the  correct 
output  sequence.  In  communications,  this  characteristic  of  the  tree  code  is  obtained  by  reducing 
the  rate  and  picking  the  code  words  at  each  node  independently  and  at  random.  In  measurements, 
the  characteristic  must  be  provided  by  the  device  under  measurement  itself. 
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-22-5439  (a-d) 


1.  Decoder  is  in  initial  state.  2.  Metric  values  for  alternatives  at  first  node 

are  computed.  Path  corresponding  to  highest 
is  chosen. 


3.  Repeated  at  node  2.  Threshold  has  been 
raised. 


4.  At  node  B,  both  metric  increments 
are  negative,  but  one  remains  above 
threshold. 


Fig.  6.  Oscillographic  simulation  output. 
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|  —  2?— 5440  (o-d)  1 


5.  Metric  increments  at  node  4  were  computed;  6.  With  lowered  threshold,  decoder  retraces, 
both  caused  metric  to  fall  below  threshold.  De¬ 
coder  then  returned  to  node  3,  found  that  untested 
branch  fell  below  threshold,  and  then  lowered 
threshold. 


Fig.  6.  Continued. 
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-22-5441 (a-d) 


9.  Branch  at  node  6  is  chosen.  Threshold 
is  raised. 


10.  Branch  at  node  7  is  chosen.  Threshold 
is  raised. 


11.  Branch  at  node  8  is  chosen.  Threshold  12.  Branch  at  node  9  is  chosen, 

is  raised.  Dropping  signal-to-noise  ratio  is 
becoming  apparent. 


Fig.  6.  Continued. 
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-22-5442  ( o  —  d) 


13.  Branch  at  node  10  is  chosen.  This  branch  14.  Branch  at  node  11  is  chosen.  Threshold 

is  incorrect.  Threshold  is  raised.  is  raised.  Although  on  incorrect  path,  metric 

is  increasing. 


15.  Alternatives  at  node  12  are  computed.  16.  Decoder  returns  to  node  11,  where  it  tries 

Both  cause  metric  to  fall  below  threshold.  untested  branch  with  highest  metric  increment. 


Fig.  6.  Continued. 
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5443  (o-d)  | 


17.  Alternative  metric  values  for  this  choice 
are  computed,  both  falling  below  threshold. 
Again  decoder  returns  to  node  1 1  where  it 
finds  no  more  untested  branches.  It  returns 
to  node  10,  finds  metric  below  threshold, 
lowers  it,  and  then  tries  branch  from  node 
11  with  highest  metric  value. 


18.  At  node  12,  it  finds  that  both  alternatives 
fall  below  threshold.  Decoder  returns  to  node  11. 


19.  Decoder  tries  untested  branch  at  node  11 
with  highest  metric  increment. 


20.  At  node  12,  both  alternatives  fell  below 
threshold.  Returning  to  node  11,  decoder  found 
no  more  untested  branches  and  therefore  lowered 
threshold.  Then  it  returned  to  node  10  to  begin 
search  with  this  new  threshold  value. 


Fig.  6.  Continued. 
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21.  Search  moves  to  node  11.  Threshold 
remains  fixed. 


22.  Branch  at  node  12  is  chosen.  Threshold 
remains  fixed. 


23.  Both  alternatives  fell  below  threshold 
causing  untested  (with  current  threshold) 
branch  with  highest  metric  increment  to  be 
checked. 


24.  Both  alternatives  fell  below  threshold. 
No  untested  branches  remained  at  node  11. 
Decoder  then  returned  to  try  untested  branch 
at  node  10  with  highest  metric  increment. 
This  is  correct  path  at  last. 


Fig.  6.  Continued. 
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25.  Branch  at  node  11  is  chosen. 


26.  Branch  at  node  1  2  is  chosen. 


27.  Branch  at  node  12  fell  below  threshold 
which  was  raised  just  after  display  26. 
Threshold  was  lowered  again. 


28.  Branch  at  node  13  is  chosen.  Decoder 
is  on  right  track. 


Fig.  6.  Continued. 


The  constraints  imposed  by  the  system  under  measurement  become  important  in  another 
way  also.  When  analyzing  the  operation  of  the  sequential  algorithm,  it  will  be  necessary  to  con¬ 
sider  the  behavior  of  the  metric  along  incorrect  paths  as  well  as  its  behavior  on  correct  paths. 
Since  the  metric  on  the  correct  path  is  a  function  only  of  the  noise  samples,  its  components  are 
independent.  However,  the  metric  on  the  various  incorrect  paths  is  a  function  not  only  of  the 
noise  samples,  but  also  of  the  particular  incorrect  z  values  which  occur  along  the  incorrect  path 
being  considered.  Thus,  in  the  analysis  of  the  metric  on  the  incorrect  path,  we  must  take  into 
account  these  z  values.  Clearly,  such  a  procedure  would  be  cumbersome  since,  in  general, 
every  incorrect  path  would  have  to  be  considered  separately. 

In  the  analysis  of  sequential  decoding  as  applied  to  communications,  this  problem  is  avoided 
by  a  mathematical  artifice  known  as  ensemble  averaging.  Instead  of  considering  the  behavior  of 
the  metric  on  the  set  of  incorrect  paths  for  a  particular  code,  we  consider  the  average  behavior 
on  the  set  of  incorrect  paths  for  an  ensemble  of  codes.  Over  such  an  ensemble,  the  output  sym¬ 
bols  along  incorrect  paths  are  independent,  and  therefore  it  is  possible  to  consider  all  the  in¬ 
correct  paths  simply.  From  such  a  result,  a  particular  code  that  gives  results  at  least  as  good 
as  the  average  is  guaranteed. 

An  analogous  procedure  is  not  plausible  in  measurements.  Even  if  we  could  consider  an 
ensemble  of  unknown  parameters  and  thereby  obtain  independence,  it  is  senseless  to  say  there 
is  a  set  of  unknown  parameters  which  could  be  measured  at  least  as  well  as  an  average.  In  ac¬ 
tuality,  we  are  trying  to  measure  a  particular  set  of  parameters  and  do  not  care  if  there  is  an¬ 
other  set  of  parameters  on  which  we  could  do  a  better  job.  We  might  also  consider  the  ensemble 
of  input  signals,  but  the  constraints  imposed  by  the  transformation  are  usually  too  strong  to  per¬ 
mit  any  simplifications  to  result  among  the  incorrect  output  vectors. 

Because  the  device  being  measured  is  not  under  the  observer’s  control,  we  have  seen  that 
it  is  possible  for  two  distinct  hypothesis  vectors  to  produce  similar  output  vectors  and  for  de¬ 
pendencies  to  exist  among  output  values  along  a  path.  Both  these  features  give  rise  to  difficulties 
which  must  be  overcome  to  proceed  with  the  analysis.  Consequently,  we  shall  make  an  assump¬ 
tion,  referred  to  as  the  differential  bias  assumption,  which  will  permit  the  analysis  to  be  com¬ 
pleted  and  which,  in  addition,  is  reasonable  from  an  intuitive  viewpoint.  Generally,  this  assump¬ 
tion  implies  that  once  an  error  is  made  in  the  decoding,  a  bias  will  be  produced  in  later  hypoth¬ 
esized  outputs  which  acts  in  the  same  way  that  the  addition  of  an  extra  noise  source  would.  This 
appearance  of  additional  noise  in  the  data  will  indicate  to  the  decoder  that  an  error  was  made  and 
that  a  retracing  procedure  should  be  started.  The  differential  bias  assumption  itself  will  be  de¬ 
fined  precisely  in  Sec.  III-E. 

III.  AVERAGE  NUMBER  OF  COMPUTATIONS 

A.  Introduction 

In  this  section,  we  shall  compute  an  upper  bound  to  the  average  number  of  times  the  decoder 
follows  loop  A  of  Fig.  5  in  decoding  a  branch  of  the  tree.  This  computation  is  similar  to  that 
done  by  Fano.^  Since  loop  A  must  be  taken  for  the  decoder  to  move  forward,  the  number  of  times 
the  decoder  follows  it  is  within  a  factor  of  two  of  the  total  number  of  computations.  Therefore, 
we  shall  henceforth  define  a  computation  as  one  pass  around  loop  A.  Note  from  Fig.  5  that  loop 
A  is  traversed  when  the  decoder  is  accepting  a  node  one  level  deeper  than  the  current  depth. 

Thus  threshold  settings  discussed  in  the  next  section  are  compared  with  the  value  of  the  metric 
at  such  a  node. 
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We  shall  see  that  when  there  is  a  sufficient  difference  between  the  correct  and  incorrect 
noise-free  data  points  as  seen  by  the  observer,  it  is  possible  to  decode  a  branch  of  the  tree  with 
a  number  of  computations  that  is  independent  of  the  depth  of  the  node  under  consideration.  In 
addition,  we  shall  see  that  as  this  difference  grows,  the  bound  on  the  computations  will  decrease 
rapidly. 

The  bounds  that  will  be  derived  are  computed  with  the  bias  constant  R  discussed  in  Sec.  II-B 
as  a  parameter.  The  effects  of  various  values  for  this  quantity  are  shown  by  means  of  curves 
derived  for  the  Gaussian  noise  case. 

B.  Splitting  N 

In  the  consideration  of  N,  the  average  number  of  computations  required  per  branch  of  the 
decoding  tree,  it  is  desirable  to  consider  separately  the  average  number  of  computations  made 
in  each  of  three  circumstances.  Before  defining  these  classes,  we  shall  introduce  the  notion  of 
a  reference  node  and  illustrate  in  a  typical  case  the  role  it  plays  in  the  computation.  Any  node 
along  the  correct  path  can  be  regarded  as  the  so-called  reference  node.  In  the  computation  of 
N,  we  consider  all  paths  stemming  from  this  node  and  calculate  the  average  number  of  branches 
along  such  paths  that  must  be  considered.  Once  this  has  been  done,  the  next  reference  node  and 
all  paths  stemming  from  it  must  be  considered  in  the  same  way.  Since  each  node  along  the  cor¬ 
rect  path  has  a  similar  set  of  incorrect  paths  stemming  from  it,  we  can  consider  the  total  number 
of  computations  on  incorrect  paths  stemming  from  the  reference  node  and  the  total  number  of 
computations  on  the  correct  branch  stemming  from  the  reference  node  as  the  total  number  of 
computations  per  branch. 

In  the  remainder  of  this  section,  we  shall  refer  to  an  incorrect  node  as  a  node  along  an  in¬ 
correct  path  stemming  from  the  reference  node.  All  other  incorrect  nodes  will  be  considered 
when  the  correct  node  from  which  they  stem  is  considered  to  be  the  reference  node. 

If  we  recall  from  Sec.  II-G  that  the  threshold  takes  on  values  quantized  by  increments  of  Tq, 
we  shall  find  it  convenient  to  define  T^  as  the  highest  value  of  the  threshold  still  below  the  value 
of  the  metric  at  the  reference  node.  In  addition,  since  the  decoder  operates  only  on  metric 
changes,  we  can  choose  its  reference  to  be  arbitrary.  For  convenience,  we  assume  that  T  =  0 
at  the  reference  node. 

It  will  be  convenient  to  divide  the  number  of  computations  to  decode  one  branch  into  three 
parts.  First,  there  will  be  one  computation  each  time  the  decoder  returns  to  the  reference  node 
and  tests  the  correct  branch.  Let  denote  the  average  number  of  such  computations.  Second, 
there  are  those  computations  required  to  consider  incorrect  nodes  when  the  threshold  is  set  at 
Tj  and  at  various  levels  above  T^.  Denote  this  average  number  by  N.+  .  Finally,  there  are  those 
computations  required  to  consider  all  incorrect  nodes  when  the  threshold  is  set  at  various  levels 
below  Tj.  We  let  be  the  average  number  of  computations  in  this  category. 

Although  it  is  possible  that  all  the  incorrect  nodes  with  metric  above  a  particular  threshold 
will  be  considered  by  the  decoder,  many  may  not  because  of  the  specific  way  in  which  the  metric 
varies  along  the  path  they  are  on.  To  be  conservative,  we  neglect  the  existence  of  such  metric 
variations  and  bound  the  desired  result  by  one  obtained  by  considering  them  all.  Thus 

Nx;  N  +  N +  +  N."  (1) 

C  1  1 
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where  Nc  is  the  average  number  of  computations  on  the  correct  branch,  N.+  is  the  average  number 
along  incorrect  paths  when  the  threshold  is  set  at  some  value  above  T^,  and  Nf  is  the  quantity 
along  incorrect  paths  when  the  threshold  is  set  at  some  value  below  T^. 

In  the  calculation  of  N,  we  neglect  the  fact  that  in  measurement  problems  of  interest  the 
tree  is  of  finite  depth  and  instead  we  assume  that  the  depth  is  infinite.  Clearly,  this  is  an  upper 
bound  to  the  average  number  of  computations  required  for  a  finite  tree,  since  the  additional  depths 
of  an  infinite  tree  provide  more  branches  that  the  decoder  may  have  to  investigate. 

In  particular,  if  the  size  of  the  tree  is  increased,  there  will  be  more  possibilities  in  which 
an  event  causing  error  could  occur.  Thus  the  number  of  computations  to  decode  one  branch  in¬ 
creases  as  the  size  of  the  tree  beyond  it  increases,  and  in  the  limit  the  tree  can  grow  to  infinite 
size. 

Once  the  infinite -depth  tree  is  assumed,  it  may  be  noted  that  the  average  number  of  compu¬ 
tations  to  decode  the  correct  branch  stemming  from  a  reference  node  is  independent  of  the  ref¬ 
erence  node's  depth.  This  is  because  the  number  of  computations  depends  on  the  behavior  of  the 
metric  along  paths  stemming  from  the  reference  node,  and  the  composition  of  the  set  of  such 
paths  is  independent  of  the  reference  node. 

C.  Events  Contributing  to  Partial  Averages 

We  consider  Nc  first.  Since  succeeding  branches  on  the  correct  path  will  be  considered 
when  the  nodes  from  which  they  stem  are  regarded  as  reference  nodes,  we  need  consider  only 
the  first  branch.  This  branch  will,  of  course,  be  considered  at  least  once  and  it  will  be  recon¬ 
sidered  once  for  each  threshold  below  T^,  below  which  the  correct  path  falls.  In  particular,  the 
decoder  will  not  return  to  the  reference  node  if  the  total  metric  does  not  fall  below  T^,  but  will 
do  so  once  for  each  different  threshold  value  below  T^  used  by  the  decoder. 

Define  P(T)  as  the  probability  that  the  total  metric  falls  below  T  somewhere  along  the  cor¬ 
rect  path.  Using  this  quantity,  we  can  bound  as 

OO 

N  ^  1  +  P(T)rp_rp  _  -rp  .  (2) 

J.0  1  ° 

As  will  be  seen  later,  and  is  heuristically  obvious,  P(T)  decreases  with  decreasing  T.  Therefore, 

OO 

N  X1+  Yj  P(T)T__iT  •  (3) 

j-o  3  ° 

Next  we  consider  nodes  along  incorrect  paths  stemming  from  the  reference  node  which  are 
considered  when  the  threshold  is  set  at  a  value,  T*  !>,T  y  It  is  possible  that  all  incorrect  nodes 
above  such  a  threshold  will  be  considered  once  for  each  threshold  value  above  or  at  T^.  The 
incorrect  nodes  in  this  category  may  or  may  not  be  considered  by  the  decoder,  depending  on  the 
behavior  of  the  metric  on  the  correct  path  and  on  the  manner  in  which  the  metric  varies  along 
incorrect  paths.  To  be  conservative,  we  assume  that  all  nodes  above  T*  ^T^  will  be  considered. 
This  is  illustrated  in  Fig.  7(a).  The  incorrect  path’s  metric  exceeds  that  of  the  correct  path  at 
the  reference  node  and  an  error  results.  Then  the  threshold  is  eventually  raised  to  T1  +  2Tq. 
Before  the  decoder  returns  to  the  reference  node,  one  computation  on  the  incorrect  path  will  be 

made  with  the  threshold  at  T,  +  2T  ,  two  at  T  .  +  T  ,  and  two  with  it  at  T,. 

1  o  1  o  1 
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T,  +  2T0 

VT0 


(a)  Error  when  metric  increases  on  correct  path. 


(b)  Error  when  metric  decreases  on  correct  path. 


(c)  Error  when  metric  decreases  on  both  correct 
and  incorrect  paths. 


Fig.  7.  Typical  metric  behavior. 
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Therefore,  we  see  that  if  N(T*)  is  defined  as  the  average  number  of  nodes  along  incorrect 


paths  at  which  the  total  metric  equals  or  exceeds  T*,  N  +  is  upper  bounded  by  the  sum 


N  + 


i  Z/  N  (T*  )rpj{c 

j=o  1  c 


(4) 


We  shall  see  later  that  N(T*)  increases  with  decreasing  T*  so  that 


N.+  <C  Yj  N(T*)t*  = 


j=-l 


T*=jT 


(5) 


Finally,  we  consider  N.  ,  the  number  of  computations  made  on  incorrect  paths  for  threshold 

settings  below  T  ^  Such  branches  will  be  considered  only  if  the  correct  path  falls  below  at 

some  depth.  In  fact,  these  branches  will  be  reconsidered  once  for  each  threshold  value  T*  <  0, 

for  which  the  correct  path  falls  below  T*  +  Tq.  This  may  be  illustrated  as  in  Fig.  7(b).  In  this 

example,  one  incorrect  node  will  be  tried  with  the  threshold  at  T..  one  at  T,  —  T  ,  and  two  at 

1  1  o 

T  -  2T  . 

1  o 

Consequently,  if  we  define  N(T*/T)  as  the  average  number  of  nodes  on  an  incorrect  path 
exceeding  T*  when  the  correct  path  falls  below  T,  we  can  upper  bound  N.  by  the  sum 


£  N(T«|T)T+=Ti.(j+1)ToP(T)T=Ti.jTo 

j=°  T=T,-iT 

1  J  o 


(6) 


£  N<T*lT>T*  =  -(j+2)T  P(T)T=.jT 


(7) 


j=0 


T=-jTr 


where  we  have  again  used  the  monotone  properties  of  P(T)  and  N(T*)  which  will  be  discussed 
later.^ 

The  reader  may  note  that  if  the  correct  path  falls  below  T  ^  T^,  some  incorrect  nodes  may 
be  considered  with  the  threshold  setting  above  T^.  Such  a  case  is  illustrated  in  Fig.  7(c).  Since 
the  metric  on  the  correct  path  fell  below  T^  and  also  fell  below  that  of  the  incorrect  path  shown, 
the  incorrect  path  was  tried  by  the  decoder.  At  one  point,  node  A  will  be  considered  for  T*  = 

T1  +  Tq.  Such  a  computation  would  be  included  in  N.f  (and  also  N.  )  despite  the  fact  that  this  path 
would  be  taken  only  if  the  metric  on  the  correct  path  falls  below  T^. 

D.  Chernoff  Bounds  to  Probabilities 

The  average  number  of  computations  has  been  upper  bounded  by  three  sums  involving  two 

quantities  P(T)  and  N(T*),  the  probability  of  the  correct  path  falling  below  T  and  the  average 

number  of  incorrect  nodes  above  T*,  respectively.  In  this  section,  these  quantities  are  upper 

1 1 

bounded  by  means  of  the  well-known  Chernoff  bound. 

This  bound  states  that  if  x  is  a  random  variable,  F(x)  is  its  cumulative  distribution  function, 
and  y(r)  is  the  corresponding  moment  generating  function. 


t  A  slight  improvement  in  the  bound  can  be  obtained  if  this  monotone  condition  is  not  imposed  until  the  summation 
is  performed. 
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then 


y(r)  =  J  erx 


dF(x) 


mm  T*  V 

F(x)^  y(r)  e  ,  any  r  ^  0 

and 


(8) 


1  -  F(x)  y(r)  e“rx  ,  any  r  0  .  (9) 

These  inequalities  have  been  extremely  valuable  in  the  analysis  of  sequential  decoding  of  tree 
encoded  messages  and  we  shall  find  them  very  useful  here  as  well. 

Let  Pk(T)  =  Pr(Mk  <  T)  be  the  probability  that  the  value  of  the  metric  at  the  k ^  node  be¬ 
yond  the  reference  node  on  the  correct  path  is  smaller  than  some  value  T.  We  observe  that  on 

the  correct  path  the  metric  increment  is  R  +  lnp  (n.),  where  n.  =  y.  —  z..  Thus  the  behavior  of 

n  j"  j  j 

the  metric  on  a  correct  path  depends  only  on  the  noise  samples. 

(k) 

Let  y cv  '(r)  be  the  moment  generating  function  of  the  metric  on  a  correct  path  of  length  k. 
That  is. 


yc(k)<r>  =  J-  •  •  J  n  Pn(nj>  exP  r  L  [R  +  lnpn(nj)] 
j=1  j=1 


dn1* • • dnk 


(10) 


where  pn(m)  is  the  probability  density  function  of  the  noise.  Then  the  Chernoff  bound  implies  that 


Pk(T).<r<k)<r)  e‘nT  =  exp{nc(k)(r)  -  rT}  ,  rN<0 


(11) 


where 

Hc(k)(r)  =  lnyc(k)(r)  . 

Next  we  turn  to  N(T*)  which  was  defined  in  Sec.  Ill -C  as  the  total  number  of  incorrect  nodes 

j.  t  h 

above  T*.  Let  Pk(T*)  be  the  probability  that  the  value  of  the  metric  at  the  k  node  along  a 
particular  incorrect  path  stemming  from  the  reference  node  exceeds  a  value  T*.  This  quantity 
depends  on  the  particular  incorrect  path  under  consideration. 

k  —  1 

We  now  note  that  if  we  consider  D  quantization  levels  there  is  a  total  of  (D  —  1)  D  com¬ 
pletely  incorrect  paths  of  length  k  stemming  from  the  reference  node.  Let  P^n(TJ#c)  be  the  largest 
Pk<T*)  of  those  computed  for  all  these  incorrect  paths.  Then  the  average  number  of  incorrect 
nodes  at  depth  k  exceeding  T*  is  given  by 

k  —  1 

(D-l)D 

N,  (T*)  =  ]_j  Pr  (metric  on  i  incorrect  path  of  length 

.  ,  k  exceeds  T*  ) 

i=  1 

Since  Pk(T*).<:  P™(T*)  for  all  incorrect  paths  it  follows  that 

Nk(T*)^  p“(T*)  •  (D-  l)Dk_1  .  (12) 

But  the  Chernoff  bounding  procedure  allows  us  to  upper  bound  P^T*  ).  Note  that 

Pn(yjl^)  =  pn(yj-z;)  =  pn(zj-z;  +nj)  • 
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(k) 

Let  y?  '(t)  be  the  moment  generating  function  of  the  metric  along  the  incorrect  path  of  length  k 
giving  rise  to  P^fT*) 


ri(k)(t) 


=  n  pn(nj)  exp  t  (R  +  lnpn(Zj  -  z*  +  m)] 

j=l  j=l 


dn^.  .  .  dn^ 


(13) 


.th 


where  p  (n.)  is  the  probability  density  for  the  j  one  of  the  k  independent  noise  samples,  z.  is 

th  n  J  *  th  J 

the  j  noise-free  output  on  the  correct  path,  and  Zj  is  the  j  noise-free  output  on  the  incorrect 

path  giving  rise  to  the  maximum  P^(T*),  P^fT*).  Then  by  the  Chernoff  bound 


P“(T*)<yi(k,(t)  e_tT* 


for  any  t  0 


(14) 


so  that  combining  Eqs.  (12)  and  (14) 

Nk(T*)^C  (D- l)Dk_1  y.(k,(t)  e‘tT*  .  (15) 

N(T*  |T)  can  be  bounded  using  a  similar  technique.  Let 

Pk|n(T*  lT)  =  Pr(Mk  >T*  lMn  <  T) 

be  the  conditional  probability  that  the  value  of  the  metric  M*  at  the  k**1  node  along  a  particular 
incorrect  path  stemming  from  the  reference  node  exceeds  a  value  T*  when  the  metric  M  at  the 
n  n  node  along  the  correct  path  falls  below  T.  This  quantity  depends  on  the  particular  incorrect 
path  under  consideration. 

k-1 

As  before,  we  note  that  there  are  a  total  of  (D  —  1)D  completely  incorrect  paths  stemming 
from  the  reference  node.  Let  p£jn(T*  |T)  be  the  largest  P^|n(T*  |T)  of  those  computed  for  all 
these  incorrect  paths.  Then  the  same  procedure  can  be  employed  by  assuming  that 


Pk|n(T*lT)^  Pk|n(T*lT)  '  <16> 

for  all  incorrect  paths.  Therefore,  the  average  number  of  nodes  along  incorrect  paths  of  length 
k  above  T*,  given  that  the  correct  path  is  below  T  at  depth  n,  is  bounded  by 

^k|n(T*  lT)^  PkTn(T*  lT)  '  (D_  1)E,k'1  •  (17) 

If  we  now  multiply  both  sides  of  this  inequality  by  Pn(T)  we  obtain 

pn(T)  Nk|n<T,,,lT)^  Pr(Mn  <TlMk  '  (D_  1)Dk_1  •  (18) 

It  is  worthy  of  note  that  the  right-hand  side  of  this  expression  is  also  an  upper  bound  to  the  joint 
probability  that  <  T  on  the  correct  path  and  that  there  is  at  least  one  node  at  distance  k  along 
some  incorrect  path  stemming  from  the  reference  node  for  which  ^  T *.  This  bound  is  due 
to  the  fact  that  the  probability  of  a  union  of  events  is  upper  bounded  by  the  sum  of  probabilities 
of  the  individual  events. 

To  further  bound  this  joint  probability,  we  can  employ  the  Chernoff  bound  in  two  dimensions. 
Note  as  before  that 

J 1  j'  Ti'^j  j  *n 
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and 


p  (y .  I  z  ?  )  =  p  (y .  —  z  T  )  = 
n  J  ]  j  *nXJj  j 


p  (z.  —  z.  +  n.) 
n  J  J  J 


when 


n.  =  y.  —  z. 
J  J 


,  (n,  k). 


Let  y.'  '  '(r,  t)  be  the  joint  moment  generating  function  of  the  metric  on  the  correct  path  of  length 
n  together  with  the  metric  on  the  incorrect  path  of  length  k  leading  to  the  maximum  |T), 

Pk?n(T*lT)-  That  is> 


yi(n'k)(r.t)=J...j’  n  Pn(nj)exp 
j=l 


r  Y  [R  +  lnPn<nj» 
j=l 


+ 1  L  ir  +  lnpn(zj  -  zj* +  nj)i 

j=i 


dn, .  .  .  dn, 

1  k 


if  k  ^  n,  t  ^  0,  and  r  ^  0,  and 


(19) 


ii 

y.(n’k)(r,t>  =  J.-.j’  J]  Pn(nj)exP 
j=l 


r  Yj  [R  +  In  pn (n j ) ] 
j=l 


+  t  Y  [R  +  In  Pn(Zj  -  z*  +  n^)] 

j=l 


dnr..dnk 


if  kx  n,  t  >,  0,  and  0.  Then 


(20) 


Pr(Mn  <  T,  M*  >  T*)^:  y.(n'  k)(r,  t)  exp{-rT  -tT*} 


(21) 


for  t  ^  0  and  r  ^  0. 

Before  proceeding  further  in  the  calculation,  we  make  an  assumption  about  the  incorrect 
paths  in  order  that  all  possible  incorrect  paths  will  not  have  to  be  considered  individually. 


E.  Differential  Bias  Assumption 

The  calculation  of  the  moment  generating  functions  defined  in  Eqs.  (19)  and  (20)  is  complicated 
by  the  dependencies  that  exist  between  the  metric  values  on  the  correct  path  and  those  on  the  in¬ 
correct  path,  and  by  dependencies  existing  along  incorrect  paths.  Indeed,  it  does  not  appear 
that  their  computation  is  tractable  without  some  simplifying  assumption.  In  this  connection,  it 
may  be  noted  that  for  an  incorrect  decision  to  be  discovered,  its  consequences  must  produce  an 
observable  discrepancy  between  the  true  noise-free  data  vector  and  the  hypothesized  noise-free 
data  vector.  Because  of  the  analog  nature  of  the  noise  effects  under  consideration,  this  discrep¬ 
ancy  must  appear  as  an  arithmetic  difference  in  at  least  some  of  the  vector  components  depending 
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Fig.  8.  Differential  bias  assumption. 
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on  differing  hypotheses.  Thus  the  effect  of  an  erroneous  decision  is  to  produce  a  bias  in  the  data. 
With  these  remarks  as  motivation,  we  make  the  following  assumption^ : 

On  all  incorrect  paths,  each  incorrect  output  value  z*  differs  from  the 
corresponding  correct  output  value  by  at  least  a  constant  <5.  More  pre¬ 
cisely,  |  zi  —  zf  |  >6  for  all  incorrect  branches. 

We  shall  see  that  under  this  condition,  the  moment  generating  functions  can  be  calculated  with¬ 
out  regard  to  the  dependencies  existing  along  incorrect  paths. 

A  geometric  interpretation  of  this  assumption  is  readily  obtained.  Consider  the  noise-free 
data  vector  of  each  possible  tree  path  of  length  n  as  a  point  in  n-dimensional  Euclidean  space. 
The  above  assumption  implies  that  the  components  of  each  incorrect  point  differ  by  at  least  <5 
from  the  corresponding  components  of  the  correct  point.  This  is  illustrated  for  two  dimensions 
in  Fig.  8. 

F.  Moment  Generating  Functions 

Under  the  differential  bias  assumption,  several  simplifications  in  connection  with  the  re- 

(k) 

quired  moment  generating  functions  occur.  We  first  consider  6 £  '(r).  By  taking  advantage  of 
the  independence  among  noise  samples,  we  obtain 


t  This  assumption  was  recently  weakened  to  the  requirement  that 

k 

I  \z.-z*\  >k6 

I  I 

i=l 

for  all  incorrect  paths  of  length  k  where  the  summation  extends  over  the  incorrect  portion  of  the  path.  The  cal¬ 
culation  of  the  moment  generating  functions  under  this  weakened  assumption  is  sketched  in  Appendix  C.  For 
Gaussian  white  noise,  the  differential  bias  assumption  can  be  weakened  to 

k  9  ? 

I  (z.  -  z?r  >  kS^ 

M  ■  ■ 

for  all  incorrect  paths. 
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(22) 


y^k)(r)  =  E 


^exp  |r  Yi  [R  +  In  pn(y.j  |  z^)]  J  ^ 


r^O 


=  erkR  {E  (pn(n)r]}k 


rkR  k,  v 
=  e  yl  (r) 


where 


Y1(r)  =  E  [pn(n)r]  .  (23a) 

Thus  it  is  sufficient  to  calculate  y^r)  which  depends  only  on  the  probability  density  function  for 
the  noise. 

In  dealing  with  yP^\t)  and  y^11'  ^(r,  t),  we  shall  show  that  the  corresponding  true  moment  gen¬ 
erating  functions  are  upper  bounded  by  generating  functions  calculated  under  the  differential  bias 
assumption  alone.  Thus,  although  these  moment  generating  functions  were  defined  by  Eqs.  (13), 
(20),  and  (21)  along  specific  paths,  they  are  upper  bounded  by  moment  generating  functions  in¬ 
dependent  of  the  particular  incorrect  path.  In  addition,  dependencies  along  incorrect  paths  due 
to  the  internal  constraints  of  the  transducer  are  removed  from  consideration.  This  upper  bound 
is  made  explicit  by  the  following  theorem. 

Theorem. 

If  we  define 


Y2(t)  =  \  Pn(z  +  n|z)  pn(z  +  n|z  +  6 ^  dn 
and  if  p^fy/z)  is  a  symmetric,  monotone-decreasing  function  of  | y  —  z  |  =  |n|, 
Y2(t)  >  y  Pn<z  +  nlz)  Pn(z  +  n|z  +  6)1  dn 


(23b) 


(24) 


if  6  >  6  and  t  >  0. 
o 

Proof. 


By  assumptions  described  in  the  theorem,  pn(y/z)  =  Pn(n)  is  a  monotone-decreasing  sym¬ 
metric  function  of  |n|.  But  a  positive  power  of  such  a  function  is  also  of  the  same  type.  Hence 

by  Lemma  2  in  Appendix  A,  the  theorem  is  proved. 

(k) 

Turning  now  to  y.'  (t),  we  again  use  the  independence  of  the  noise  samples.  From  Eq.  (13), 

we  have 


Y/k)(t)  =  E 


^exp  jt  Yj  [R  +  lnPn(zj  “  zj“  +  nj>'J^ 


t  >.  0 


tkR 


n  pn(nj)exp 

j=1 


t  / .  lnp  (z.  —  z?  +  n.) 
u  j  j  y 

j=1 


dn  .  •  •  •  dn. 

1  k 
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(25) 


y.(k)(t)  =  etkR  J.  .  .  j  [I  Pn(nj)  Pn<zj  -  z*  +  n.)1  dnr  .  .  dnk 

j=l 

k 

=  etkR  FI  i  P  (n*)  P  (z*  —  z*  +  n.)*  dn. 

11  J  *nv  j'  *nv  J  J  J  J 
3=1 


.  tkR  k,, . 
y*(t) 


where 


r^i) 


I 


Pn(n)  Pn(n  + 


6)1  dn 


(26) 


(k) 

The  inequality  follows  from  the  theorem  expressed  by  Eq.  (24).  Thus  y.'  (t)  is  bounded  by  a 

function  dependent  only  on  the  noise  probability  density  function  and  on  the  constant  6,  introduced 
by  the  differential  bias  assumption. 

In  the  same  way  y^n,^>(r,t)  can  be  shown  to  be  bounded  by  a  function  depending  only  on  the 
probability  density  function  of  the  noise  and  on  the  constant  6.  From  Eq.  (20) 


y  (ll>  k) 


(r,  t) 


)•••]  n  Pn(nj)exp 


j=1 


n 

r  Yj  tR  +  lnPn(nj)l 
j=l 


dn, .  .  .  dn, 

1  k 


+ 1  Yj  [r  +  lnpn<zj  ^  +  nj)i 

j=i 

n 

=  exp {nrR  +  ktR}  J.  .  .  {  n  Pn(n.j)1+r  Pn<zj  ~  zj  +  ^  dnr  •  •  dnn 

j=1 


\  .  .  .  \  n  P  (n  )  P  (z-  —  z*  +  n.)t  d,  ,.  .  .  dn, 

J  J  iA  j7  j  j  j  k+1  V 

j=n+l 


^  exp  {nrR  +  ktR}  y^(r,  t)  y^“n(t) 


for  k  ^  n,  t  ^  0,  r  ^  0,  where 


V3(r 


,  t)  -  y  pn(n)1+r  pn(n  +  6)1  dn 


(27) 


(28) 


and  y2(f)  is  defined  in  Eq.  (26).  If  n  ^  k,  Eq.  (27)  is  replaced  by 

y^11,  k)(r,t)^:  exp  {nrR  +  ktR}  y^(r,  t)  y^~k(r)  (29) 

where  y^r)  is  defined  in  Eq.  (23a). 

The  fact  that  a  different  bound  obtains  in  the  two  cases  will  make  later  computations  very 

12 

laborious.  However,  a  simple  application  of  the  Schwartz  inequality  gives  us  an  upper  bound 
that  is  common  to  the  two  expressions  above.  This  bound  is  derived  in  Appendix  A  as  Lemma  1 
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and  indicates  that  we  can  define  two  functions,  each  of  which  is  simply  related  to  the  pertinent 
moment  generating  function  in  the  following  manner: 


Y\(r)  =  [y1(2r)]1//2 

y!>(t)  =  [y2(2t)]l/2  . 

Then 

y3(r,  t)<  y!j(r)  y'2(t) 
y1(r)  ^  y'^r) 

y2(t)^y^(t)  .  (30) 

Applying  these  bounds,  we  obtain  from  Eqs.  (27)  and  (30)  the  following  bound  to  y^n’^\r,  t) 
which  holds  for  all  values  of  n  and  k. 

y.(n*  k>(r,t)^  exp {nrR  +  ktR}  (y^r)]”  [y^(t))k  (31a) 

and  r  0  and  t  ^  0. 

Hence  we  can  deal  with  y1(r)  and  y 2(t)  which  depend  only  on  the  noise  density  function  and 
the  constant  <5.  It  will  be  convenient  to  define 

p1(r)  =  lny^r)  =  InE  [pn(n)r]  ,  r^O 

and 

P2(t)  =  lny2(t)  =  InE  [pR(n  +  6)1]  ,  t  ^0  (31b) 

so  that  from  Eqs.  (31a)  and  (31b) 

y.(n'  k)(r,  t)  ^  exp {n  [rH  +  M-1(2r)]  +  k  [tR  +  ~  p2(2t)]}  (32) 

for  r  .<:  0  and  t  ^  0. 

Now  that  we  have  discussed  the  moment  generating  functions  and  have  introduced  the  differ¬ 
ential  bias  assumption,  we  can  return  to  the  main  objective,  that  of  bounding  P(T),  N(T*),  and 
N(T*  |T),  the  probability  that  the  metric  on  the  correct  path  falls  below  T  for  some  depth,  the 
average  number  of  incorrect  nodes  along  all  incorrect  paths  for  which  the  metric  exceeds  T*f 
and  the  same  quantity  conditional  on  the  correct  path’s  falling  below  T.  Since  the  probability  of 
a  union  of  events  is  less  than  or  equal  to  the  sum  of  the  individual  events,  we  can  upper  bound 
these  quantities  by  the  proper  sums  over  n  and  k.  Thus 

OO 

p(T)^  Z  pn(T)  (33) 

n=  1 

OO 

N(T*)^C  Yj  Nk(T*)  (34) 

k=  1 
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oo 


OO 


N(T*|T)<  Yj  E  Nk/n(T*  |T)  Pn(T)  (35) 

k=l  n=l 

where  Pn(T)  is  the  probability  that  the  correct  path  is  below  T  at  depth  n,  N^(T*)  is  the  average 
number  of  nodes  along  incorrect  paths  at  depth  k  which  exceed  T*,  and  |T)  is  the  aver¬ 

age  number  of  nodes  along  incorrect  paths  at  depth  k  which  exceed  T*  when  the  correct  path 
falls  below  T  at  depth  n,  as  discussed  in  Sec.  III-D. 


G.  Performing  the  Sums 

The  contents  of  the  previous  sections  can  be  summarized  by  indicating  three  summations  to 
be  performed.  Combining  Eqs.  (3),  (11),  (22),  (30),  and  (33),  we  obtain 


oo  oo 


Nc<:  1  +  E  E  exp {n  [rR  +  p1(2r)]  +  jrTQ} 

j=0  n=l 


oo  oo 


=  1+  E  Z  exp{jrTo  +  na(r)}  ,  r^O 
j-0  n= 1 
1 


n(r)  =  rH  +  ^  H^Ur) 

From  Eqs.  (5),  (15),  (25),  (30),  and  (34),  we  get 


oo  oo 


N+<C  E  E  (D- l)Dk_1  exp{k  [tR  +  |  p2(2t)]  -  jtTo} 
j=-l  k=l 


(36) 


E>D-i  E  E  exp{-jtTQ  +  k  [tR  +  '|  H-2(2t)  +  InD]} 
j=-l  k=l 


V5  E  E  exp{-jtTo  +  k/J(t)}  ,  t^O 

j=-l  k=l 

1 


(37) 


0(t)  =  tR  +  j-  K2(2t)  +  InD 

Finally  by  considering  Eq.  (7)  and  successively  substituting  Eqs.  (18),  (21),  (32),  and  (35),  we  obtain 


oo  oo  oo 

E  E  E  (D-  l)Dk_1  exp{jrTo  +  (j  +  2)  tTo  +  n  [rR  +  |  p1(2r)]} 
j=0  k=l  n=l 

x  exp  {k  [tR  +  J  p2(2t)]} 

OO  OO 

=  E  E  exp{(j  +  2)  tTo  +  k  [tR  +  |  jiz(2t)  +  InD]} 

j=0  k=l 


x  E  exp{jrTQ  +  n  [rR  +  |  P1(2r)]} 
n=0 
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n7  =  2_1  Yi  Y  exp{(j  +  2)  tTQ  +  k/J(t)}  Y  exp{jrTo  +  na(r)} 
j=0  k=l  n=0 


where 


0(t)  =  tR  +  j  P2(2t)  +  InD  ,  t^O 


(38) 


(39) 


and 

a(r)  =  rR  +  Pt(2r)  ,  r  0 


(40) 


and  ^(Zr)  and  p^^t)  are  given  in  Eq.  (31b). 

The  parameters  r  and  t  appearing  in  these  summations  were  introduced  in  the  Chernoff 

bounds  to  terms  that  have  been  combined  in  the  calculations.  It  is  not  necessary,  nor  is  it  de- 

✓ 

sirable,  that  r  and  t  be  chosen  the  same  for  all  these  terms,  but  rather  each  of  these  param¬ 
eters  should  be  chosen  to  minimize  the  bound.  Thus  the  optimum  r  and  t  are  really  functions 
of  the  summing  indices  j,  k,  and  n. 

Because  of  the  arithmetic  complications  produced  by  complete  optimization,  it  is  desirable 
to  pick  only  two  values  for  r  and  two  for  t.  Thus  we  choose 


r 


r  n  <  n  . 
o  1 


rJ[  n^n1 


(41) 


and 


t  = 


t 

o 


k  <  kd 

k  ^  k! 


(42) 


Since  the  exponent  in  Eq.  (38)  is  dominated  by  the  terms  independent  of  n  and  k  when  n  and  k 

are  small,  we  shall  choose  r  and  t  such  that  the  coefficients  of  n  and  k  are  each  zero.  Then, 

o  o 

as  n  and  k  increase  beyond  n^  and  k^,  respectively,  and  this  term  becomes  more  important, 

we  shall  choose  r,  andt,  to  minimize  the  coefficient.  More  precisely,  r  ,  r,,  t  ,  andt,  are 
li  olo  l 

chosen  to  satisfy 


a(rQ)  =  0  (43) 

/?(tQ)  =  0  (44) 

a'O^)  =  0  (45) 

/?,(t1)  =  0  .  (46) 


In  addition,  n1  and  k^  are  chosen  as  those  values  of  n  and  k  for  which  the  bounds  obtained  using 
rQ  and  tQ  just  exceed  those  obtained  using  r^  and  t^. 

That  is,  we  choose  n^  such  that  the  term  corresponding  to  n  =  n^  -  1  is  smaller  for  r  =  rQ 
than  for  r  =  r^,  whereas  the  term  corresponding  to  n  =  n^  is  larger  for  r  =  rQ  than  for  r  =  r^. 
Thus 

jroTo^  (n1  -  1)  a(ri)  +  jriTo  (47) 


36 


jroTo>nltt(rl)  + jrlTo  • 

That  is,  is  defined  as  an  integer  satisfying 

j (r  —  r.)  T  i (r  —  r  .)  T 

JV  o  V  o  .  _  „  J'  o  1  o  .  , 

a(r1)  ^  1  a(r1) 

for  j'Tq  >  0  and  n^  =  0  for  jTQ  =  0.  Similarly,  we  define  as  the  integer  satisfying 
j(t  -  t.)  T  i(t  -t  J  T 

_o _ K  £  <  k  <  _2 _ 1_Q  +  ! 

>3(t1)  '^Kl<  0^) 

for  iT  >  0  and  k.  =  0  for  jT  =  0.  Therefore, 
j  o  1  J  o 

exp[n1a(r1)  +  ji-jTj  ^  exp  [jroTo) 


(48) 


(49) 


(50) 


(51) 


and 


exp[k1^(t1)  +  jt1TQ]  exp[jtoTol 


(52) 


We  shall  carry  out  the  summations  first,  and  then  discuss  the  conditions  under  which  solu¬ 
tions  can  be  obtained. 


Nc<:i+  E 


j=o 


fni_1 

oo 

E  exP[Jr0T0)+  E  exp[jr1TQ  +  na(r1)] 
n=l  n=n. 


Using  Eq.  (49)  and  the  relation  2  x1  =  l/(l  —  x), 

£  (j  [— afrj)  “]  exP[JroTo 


N  X  1  + 
c  v- 


1  + 


5=o 


1  -  exp[a(r1) 


exp[jr  T  ] 


Next,  using  the  relation  2  ix1  =  x/(l  —  x)  , 


N  X  1  + 
c  ^ 


— + 


(r  -  r  ,)  T 
x  o  v  o 


r  T 
_  o  o 


o?(r,) 


. )  /  r  T  v  2  r  a  (r .)!  /  rT\ 

1  (i  -  e  °  °)  [i  -  e  1  ]  (l  -  e  °  °) 


(53) 


In  the  sum  for  N. ,  we  note  that  except  for  the  first  term,  we  are  dealing  with  positive  thresh¬ 
old  values.  Thus  the  bound  describing  the  choice  of  k^  is  not  valid  and  instead  we  use  a  single 
value  of  t,  t^,  for  all  nonnegative  values  of  j.  The  final  bound  can  thus  be  optimized  over  this 
additional  parameter.  Thus,  from  Eqs.  (37),  (42),  and  (50), 


SN<  ^  ( 


exp[toTo1 


(kl  -  1)  +  exp[t1TQ  +  k1/3(t1)]  E  exp[k)3(t1)] 

k=0 


0  °°  \ 
E  exPt5t2To'  E  exp[k/3(t2)M 

j=-oo  k=l  ' 
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5C7+  D  -  1  / 
Ni  =  -D-  (' 


lt  p  1  tl)To  exP  [t0T0l 

exp  O  o  0(tt)  {1  -  exp[^(t1)]} 


_ exp[/3(tz)l _ \ 

+  {1  -  exp(-t2To]}  {l  -  exp[/3(t2)]}  / 


(54) 


Finally,  from  Eqs.  (38),  (41),  (42),  (49),  and  (50)  as  well  as  the  relation  2  i2x*  =  x(l  —  x)/(l  —  x)2. 


k1-l 


j=0 


E  exp[(j  +  2)  tQTo]  +  E  exp[(j  +  2)  ttTo  +  kj3(t1)] 
k=  1  k=k. 


nr* 


E  exp[jroTQ]  +  E  exp[jr1TQ  +  nafr^) 


n=  1 


D  -  1 
D 


(r  —  r  , )  (t  —  t . )  exp  [  (t  +  r  )  T  ]  ( 1  +  exp  [  (t  +  r  )  T  ] ) 
o _ 1  o  1  2  1  o  o'er'  r  lo  o  o 


exp  [2t  T  ]  -  y  \  ou  \  T4, 

y  o  oJ  a(r1)  j3(t1)  o 


{1  —  exp  [  (tQ  +  rQ)  Tq]} 


+ 


( 


2  exp[2tQTo] 


(ro~rl)  (tQ  1 1 } 

«(r1)  )3(t1) 


,  exp  [2t  T  ]  (t  -t.)T 
T  2  o  o1  o  1'  o 

o  {1  -  exp(a(r1)]}  /3(t1) 


+  (ro-rl)To  \  exP[(to  +  rp)  T0]  /2exp(2tnTo]  <t„  - 1  < )  T0 

a(r4)  {l  —  exp(^(t1)]}/  {l  _  exp  ( ^  +  Tq)  Tq]}2  V  {l  -  exp  [a  (r^]}  0(1^ 

+  {1  -  exp[a(r1)]}  {l  -  exp  1/3 (t ±)] })  {l  -  exp [ (tQ  +  rQ)  Tq]} ]  *5' 


if  a(r^)  <  0,  (t ^ )  <  0,  and  tQ  +  rQ  <  0. 

We  shall  see  that  these  conditions  on  r  ,  t  ,  r  ,  and  t  place  restrictions  on  the  range  of 
2  2  i  i  o  o 

the  ratio  6  /cr  that  permits  convergence  of  the  sums.  If  these  conditions  can  be  satisfied,  we 

shall  have  shown  that  the  number  of  computations  for  decoding  one  branch  is  bounded  by  a 

constant. 


H.  Existence  Conditions 

It  remains  to  show  that  solutions  to  the  equations 


a(rQ)  =  0 

r  <  0 
o 

O 

II 

00. 

t  >  0 
o 

a’(r1)  =  0 

r  <  0 
1 

to 

II 

o 

o 

A 

exist  in  such  a  manner  that 

r  +  t  <  0 
o  o 
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We  deal  first  with  a(r).  It  must  be  shown  that  o?(r)  has  the  appearance  of  curve  A  of  Fig.  9. 

That  is.  there  is  an  r  <0  and  an  r.  <0  such  that 
o  1 

a(rQ)  =  0 

and 

a,(r1)  =  0  . 

We  shall  show  that  a’(0)  is  positive  for  noise  powers  less  than  some  critical  value,  that 
a(r)  is  convex  upward,  and  that  a  (r)  takes  on  a  positive  value  for  r  >  — l/2.  These  conditions 
provide  the  desired  result. 

Some  of  the  properties  of  o?(r)  are  easily  calculated.  From  Eqs.  (23a),  (31b),  and  (40), 
a(r)  =  j  lnjj  p^(n)1  +  2r  dn  +  rR 


a  (0)  =  0 
a  (”2)  =  °° 


unless  Pn(n)  =  0  for  some  interval  of  nonzero  length.  From  an  engineering  standpoint,  this  is 
impossible  since  some  noise  will  always  be  present  and  should  be  included  in  any  realistic  model. 
Further, 


a 1  (r)  =  R  + 


f  p(n)*  +  2r  lnp(n)  dn 
/  p(n)1+2r  dn 


a'(0)  =  R  +  J  p(n)  lnp(n)  dn  =  R  —  H(N) 


If  R  exceeds  H(N),  the  entropy  of  the  noise,  then  a’(0)  will  be  positive.  a(r)  is  convex  upward: 

2  J  p(n)1  +  2r  [lnp(n)]2  dn  •  f  p(n)*+2r  dn 


a  M  (r)  = 


[  f  p(n)1+Zr  dn] 


-  2 


f  p(n)1  +  2r  lnp(n)  dn' 


/  p(n) 


l+2r 


dn 
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Application  of  the  Schwartz  inequality  to  the  numerator  gives  the  desired  result  that  an(r)  ^  0. 
Lemma  3  in  Appendix  A  provides  the  proof  that  rQ  and  r^  do  exist. 

Turning  now  to  0(t),  we  obtain  several  of  its  properties: 

/3(t)  =  tR  +  j  lnj  p(n)  p(n  +  «5)2t  dn  +  In  D 

D(n  +  6)2*  lnp(n  +  6)  dn 
p(n)  p(n  +  6)2*  dn 

2  f  p(n)  p(n  +  6)2t  [lnp(n  +  6)]2  dn  *  f  p(n)  p(n  +  6)2t  dn 
[  f  p(n)  p(n  +  6)2t  dn]2 

f  p(n)  p(n  +  6)2t  lnp(n)  dn 
f  p(n)  p(n  +  <5)2t  dn 

Again  applying  the  Schwartz  inequality,  we  have  the  result  that  0M(t)  0. 


0(0)  =  InD 


0'(t)  =  R  + 


/  P(n>  : 


/ 


3-22-5972 


These  properties  of  0(t)  show  that  it  has  one  of  the  three  forms  illustrated  in  Fig.  10.  As 
this  figure  points  out,  only  form  C  satisfies  the  conditions 

0(t  )  =  0  t  >  0 

K  o  o 

and 

P,(t1)  =  o  t4>o  . 

Unfortunately,  the  specific  form  of  p(n)  must  be  considered  before  it  can  be  definitely  estab¬ 
lished  that  0(t)  has  form  C.  In  addition,  the  requirement  that  tQ  +  rQ  <  0  cannot  be  established. 

We  therefore  turn  to  a  specific  form  for  p(n),  the  Gaussian  form,  which  will  be  studied  in  detail 
because  of  its  practical  interest. 

I.  Gaussian  Noise 

Since  Gaussian  white  noise  is  that  most  commonly  encountered  in  practice,  we  shall  discuss 
it  in  detail.  The  noise  vector  has  independent  components  each  determined  according  to  the  prob¬ 
ability  density  function 
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(56) 


Pn(n)  =  (Zjrcr2)"1/2  exp [— n2/2<r2] 

Using  this  density  function,  the  various  moment  generating  functions  can  be  computed  by  simple 
integration.  From  Eqs.  (23a)  and  (23b), 


'l<r)  =  J  p(n)1+r  dn 


=  ^  (2jrcr2 )  (*+r)/2  exp[— n2(l  +  r)/2cr2]  dn 


=  [ (2jrcr2)r  (1  +  r)]"l/2 


Y2(t)  =  J  p(n)  p(n  +  <5)1  dn 

=  ^  (27ra2)"^1+t^2  exp{— (n2  +  t(n  +  6)2)/2cr2}  dn 


=  ( (2irtr2)t  (1  +  t)]-1/2  exp[-62t/2cr2(l  +  t)] 
Thus,  using  Eqs.  (39)  and  (40), 

a (r)  =  r(R  -  |  ln27rcr2)  -  |  In  (1  +  2r) 


and 


2 

/?(t)  =  t(R  -  |  In  2jr<72)  -  |  In  (1  +  2t) - ^  * 


+  InD 


2(7  (1  +  2t) 


(57) 


(58) 


(59) 


(60) 


Because  of  the  transcendental  nature  of  these  equations,  it  is  not  possible  to  solve  them  ex¬ 
plicitly  for  rQ  and  t  .  However,  solutions  can  be  found  for  r^  and  t^. 


rl  = 


1  -  2C 
4C 


f  1  /I  ±  8CS  +  1 

X1  ~  2  +  \  8C 

where 

C  =  R  -  |  In  2jt<72 


and 


cr 


Although  it  will  not  always  do  so,  the  positive  term  in  the  brackets  is  the  only  one  which  can  lead 
to  a  positive  t^  for  positive  C  and  S. 

It  was  not  possible  to  obtain  closed-form  conditions  under  which  ^(t^)  and  tQ  +  rQ  are  nega¬ 
tive.  In  view  of  these  difficulties,  as  well  as  the  complexity  of  the  bounds  to  Nc,  N*,  and  N^, 
we  have  plotted  the  bounds  as  a  function  of  R  —  \  In  2ttct^,  the  value  of  the  metric  when  the  noise 
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sample  is  zero.  The  three  bounds,  as  well  as  their  sum  N,  are  plotted  in  Fig.  ll(a-d),  with  the 
2  2 

ratio  6  / <r  as  a  parameter.  In  plotting  these  curves,  D  was  set  equal  to  2  and  T  was  chosen 

2  ^ 

to  be  one-half  the  constant  R  —  \  In  27tcj  .  These  curves  will  be  discussed  in  the  next  section. 

J.  Summary 

The  results  of  this  section  can  be  summarized  in  the  following  theorem: 

Theorem. 

If  D-level,  sequentially  involved  parameters  are  decoded  from  data  perturbed  by  additive 
noise  with  probability  density  given  by  p(y|z),  a  monotone-decreasing  symmetric  function  of 
|y  —  z|,  and  if  each  noise-free  data  point  along  an  incorrect  path  differs  from  the  corresponding 
noise-free  point  on  the  correct  path  by  at  least  6,  the  average  number  of  computations  to  decode 
a  branch  is  bounded  by  the  sum  of  the  three  expressions  for  N^,  N^+,  and  N.”  given  in  Sec.  III-G. 

The  fact  that  this  bound  is  a  constant,  independent  of  the  depth  of  the  reference  node,  indi¬ 
cates  that  as  long  as  the  conditions  of  the  theorem  hold,  the  average  number  of  computations  for 
decoding  a  branch  is  fixed  for  all  depths. 

To  better  understand  the  bounds  calculated  in  this  section,  we  discuss  them  in  detail  for 
Gaussian  noise.  We  see  in  Fig.  11(a)  that  the  number  of  computations  for  decoding  correct 
branches  that  stem  from  the  reference  node  is  small  whenever  the  bias  constant,  R  —  \  In  Zirv  , 
is  large,  and  is  very  large  whenever  the  constant  is  small.  This  is  due  to  the  fact  that  whenever 
the  bias  is  too  small,  the  correct  path  will  always  be  negative  and  will  therefore  appear  like  an 
incorrect  path  to  the  decoder.  Since  considerations  along  the  correct  path  do  not  involve  points 
along  incorrect  paths,  the  distance  of  these  paths  from  the  correct  one  does  not  enter  the  bound. 

The  contribution  to  the  average  number  of  computations  along  incorrect  paths,  when  the 
threshold  is  above  —  Tq,  increases  with  the  bias  constant  and  does  so  more  rapidly  as  <5  /a  in¬ 
creases.  This  is  seen  in  Fig.  11(b).  If  the  bias  constant  increases,  more  incorrect  nodes  will 

2  2 

belong  to  this  group  and  will  appear  correct  to  the  decoder.  If  6  /a  is  small,  the  correct  path 
will  look  very  similar  to  the  incorrect  paths,  and  many  branches  will  be  traversed  before  con¬ 
ditions  bring  about  a  return  to  the  correct  path. 

Finally,  we  consider  the  average  number  of  computations  along  incorrect  paths  when  the 
threshold  is  below  — Tq.  When  the  bias  is  small,  most  of  the  incorrect  will  belong  to  this  group, 
and  if  it  is  very  small  the  correct  path  will  also  be  decreasing,  thereby  causing  these  incorrect 
paths  to  be  investigated  frequently.  Thus  there  is  a  sharp  increase  in  N.  for  small  bias,  as  can 
be  seen  in  Fig.  11(c).  If  the  bias  constant  is  very  large,  an  incorrect  path  in  this  category  would 

be  investigated  only  if  a  very  large  noise  sample  occurs.  In  the  event  that  it  does  occur,  very 

2  2 

many  computations  would  be  needed  to  overcome  it,  especially  if  6  /a  is  too  small. 

In  Fig.  11(d),  the  composite  curves  are  plotted.  The  choice  of  the  bias  constant  does  not 

2  2 

seem  to  be  too  critical,  so  long  as  6  /v  is  not  too  small.  As  this  quantity  decreases,  the  sen¬ 
sitivity  of  N  to  the  bias  constant  increases. 

IV.  PROBABILITY  OF  ERROR 
A.  Introduction 

In  this  section,  we  shall  compute  a  bound  to  the  probability  of  reaching  an  incorrect  terminal 
node  that  is  satisfactory  to  the  decoder.  We  shall  see  that  it  decreases  exponentially  with  W,  the 
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(a)  vs  bias  constant. 


(b)  N.  vs  bias  constant. 


Fig.  1 1 .  Behavior  of  N  and  its  components  vs  bias  constant. 
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(c)  N.  vs  bias  constant. 


(d)  N  vs  bias  constant. 


Fig.  11.  Continued. 


44 


length  of  the  tail,  and  that  the  exponent  improves  as  the  incremental  bias  6  increases  for  a  fixed- 
probability  density  function.  The  error-probability  bound  is  derived  with  the  bias  constant  as  a 
parameter,  and  the  effects  of  varying  it  are  considered  by  means  of  curves  derived  for  Gaussian 
noise. 

B.  Events  Leading  to  Errors 

Recall  that  in  Sec.  Ill  we  introduced  the  notion  of  a  reference  node,  and  computed  the  average 
number  of  computations  required  to  accept  the  correct  branch  stemming  from  it.  The  finite  size 
of  the  tree  was  ignored,  since  the  infinite  size  case  provided  an  upper  bound  and  was  simpler  to 
consider.  However,  the  finite  size  of  the  tree  could  cause  the  decoder  to  complete  the  hypothesis 
vector  along  an  incorrect  path  before  the  metric  on  the  incorrect  path  has  begun  to  fall.  Conse¬ 
quently,  the  finite  size  of  the  tree  plays  a  role  in  producing  errors  and  must  be  considered  when 
calculating  the  error  probability. 

In  this  consideration  of  the  finite  tree,  the  nodes  along  the  correct  path  are  no  longer  homo¬ 
geneous.  Therefore,  each  correct  node  must  be  considered  separately,  and  the  theorem  on  the 
probability  of  a  union  of  events  must  be  used  to  bound  the  total  error  probability.  As  before,  we 
consider  each  node  along  the  correct  path  as  the  reference  node  separately.  Because  of  the  in¬ 
homogeneity  of  the  nodes  along  the  correct  path,  we  must  define  T  ^(1)  as  the  highest  threshold 
below  the  metric  value  at  the  reference  node  for  depth  l .  Since  we  are  again  considering  only 
changes  in  the  metric,  we  may  arbitrarily  choose  its  reference.  For  convenience,  we  choose 
the  metric  to  be  zero  at  the  reference  node.  Thus  — T  <  T  AH )  <  0. 

There  are  two  situations  from  which  errors  can  arise.  Suppose,  first,  that  there  is  an  in¬ 
correct  path  leaving  the  correct  path  at  depth  l  with  a  metric  which  remains  above  T^fl)  for  the 
entire  tree  duration  after  depth  f,  and  that  this  path  is  tested  by  the  decoder  before  the  correct 
one.  It  is  clear  that  this  path  will  appear  satisfactory  to  the  decoder  regardless  of  the  behavior 
of  the  metric  on  the  correct  path.  Unless  the  metric  on  a  path  under  test  falls  below  T^(f ),  the 
decoder  will  never  return  to  the  node  to  change  its  incorrect  decision. 

Let  Q^+  be  the  probability  that  there  is  an  incorrect  path  remaining  above  T^f )  for  all  depths 
greater  than  i .  Let  Q+  be  the  total  contribution  to  the  error  probability  by  situations  of  this 
type.  Since  the  probability  of  a  union  of  events  is  bounded  by  the  sum  of  the  probabilities  of  the 
individual  events,  we  have 

L 

Q+^  Z  Q/  •  <6l> 

l  =  1 

The  other  situation  resulting  in  error  takes  place  if  the  metric  on  the  correct  path  at  some 

node  beyond  l  falls  below  a  threshold  value  T^  T^*).  If  the  correct  path  falls  below  T  and 

there  should  be  an  incorrect  path  leaving  the  correct  path  at  node  t  and  remaining  above  T  —  Tq 

until  the  end  of  the  tree,  difficulty  might  arise.  For  when  the  metric  on  the  correct  path  falls 

below  some  T,  other  paths  will  be  tried  until  one  is  found  which  is  above  T  —  T  .  Such  a  path 

will  be  followed  until  it  falls  below  T  —  T  .  If  it  does  not,  an  error  will  occur. 

o 

Define,  therefore,  as  the  probability  that  the  metric  on  the  correct  path  starting  at  depth 
l  falls  below  some  threshold  value  T  ,<C  T^i)  and  that  there  is  an  incorrect  path  leaving  the  cor¬ 
rect  path  at  depth  l  with  metric  remaining  above  T  —  Tq.  Then,  using  the  theorem  on  the  prob¬ 
ability  of  a  union  of  events,  we  bound  the  total  contribution  to  the  error  probability  from  this 
second  error  situation  Q  by  the  sum 
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(62) 


L 

Q  <  L  Q* 

a  =  1 

Finally,  the  total  error  probability  is  bounded  by  the  sum  of  the  two  contributions  so  that 

pe^Q+  +  Q_  .  (63) 

C.  Chernoff  Bounds 

First,  we  shall  compute  a  bound  to  Q*,  the  probability  that  there  is  some  incorrect  path 
with  a  metric  remaining  above  T^(f )  for  the  entire  tree  duration.  We  consider  the  probability 
that  a  particular  incorrect  path  remains  above  ),  using  for  the  computation  that  path  most 
likely  to  remain  above  T^i).  We  can  then  multiply  by  the  number  of  paths  to  obtain  a  bound  on 
the  desired  probability  for  some  path.  This  is  the  same  procedure  used  in  the  calculation  of  N. 

For  the  particular  path  used  in  the  computation,  we  desire  the  probability  that  its  metric 
remains  above  T^l)  at  depths  f ,  S.  +  1,  .  .  . ,  L.  This  is  the  intersection  of  the  events  that  the 
metric  is  above  T^(f)  at  each  depth  individually.  However,  the  probability  of  an  intersection  of 
events  is  upper  bounded  by  the  probability  of  any  one  of  the  composite  events.  Since  the  prob¬ 
ability  that  an  incorrect  path  is  above  T*  at  depth  k  decreases  with  increasing  k,  we  choose  as 
the  event  in  this  bound  the  one  corresponding  to  depth  L  +  W,  where  L  is  the  depth  of  the  tree 
and  W  is  the  number  of  observations  remaining  after  the  last  node  has  been  reached.  Conse¬ 
quently, 

Pw+L-I  [Tl(/)1  '  (64a) 

where  Pw+L-*  [T^(f )]  is  the  probability  that  a  particular  path,  composed  of  W  +  L  —  /  incorrect 
noise-free  data  points  differing  from  the  correct  noise-free  data  points  by  6,  remains  above 
T1(f ).  If  we  recall  that  Pw+L-I  ^  increases  with  decreasing  T,  we  can  eliminate  T1(i  )  by 
the  inequality 

PW+L-/  [T1^>^PW+L-/-To)  •  (64b) 

But  we  have  bounded  in  Sec.  III-D.  Thus,  from  Eqs.  (14),  (25),  and  (64b) 

PW+L-itTl(/,l  (t)  exPltT0] 

^yW+L-l(t)  eXp{t  [(w  +  L-/)  R  +  To]}  .  (65) 

We  now  turn  to  ,  the  probability  that  the  correct  path  falls  below  T^  and  that  some  in¬ 
correct  path  starting  at  depth  t  falls  at  most  Tq  below  the  smallest  value  to  which  the  correct 
path  falls.  This  quantity  is  bounded  by  the  sum  over  T  of  the  conditional  probability  Q^~(T)  that 
the  correct  path  falls  below  T  while  some  incorrect  path  remains  above  T  —  Tq,  that  is 

CO 

E  Q/fTj-jTo)  •  (66) 

j=0 

L»  -f  - 1 

Again  we  consider  there  are  (D  —  1)D  incorrect  paths  with  lower  probabilities  than 

those  on  a  particular  incorrect  path,  and  again  we  assume  that  the  noise-free  data  points  on  this 
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particular  path  differ  from  those  on  the  correct  path  by  <5  at  all  time  intervals.  By  the  same 
reasons  used  earlier, 

L+W-f 

Q/m^D-DD^-1  s  Pr(Mk<T  .  M£+w_f^T-To)  (67) 

k=  1 

where  the  summation  is  a  bound  to  the  probability  that  the  correct  path  falls  below  T  for  some 
depth  beyond  f  and  the  particular  incorrect  path  remains  above  T  —  Tq  for  all  depths  beyond  f . 

Finally,  we  can  employ  the  Chernoff  bound  to  the  summand  obtained  in  Sec.  III-D.  From 
Eqs.  (67),  (21),  and  (27), 

L+W-f 

Q/(T)^  (D-  l)DL_f_1  Yj  y.(k-L+w-f)  (r,t)  exp[-rT  -  t(T  -  Tq)) 

k=  1 

L+W-f 

(D-  l)DL"f_1  Y  y3k(r,t)y2L+W-*-k(t) 

k=l 

X  exp  {r  (kR  -  T)  +  t  [  (L  +  W  -  f  -  k)  R  -  T  +  TJ}  .  (68) 

In  the  next  section,  we  consider  these  sums. 


D.  Carrying  Out  the  Sums 

The  results  of  this  section  can  be  summarized  by  the  two  inequalities.  From  Eqs.  ( 6 1 ) , 
(64),  and  (65),  we  obtain 


L 

Q+^  Z  (D-  1)DL‘*_1  r  W+L-/  (t)  exp{t  [(W  +  L-n  R  +  TJ}  ,  t^O  (69) 

1  =  1 

and,  from  Eqs.  (62),  (66),  and  (68),  we  conclude  that 
L  °°  L+W-f 

Q-fl  Z(D-.IDL-'-‘  2  r3k(r.«)y2wtL-'-k(«) 

f  =  l  j=0  k=  1 

X  exp  {r(kR  +  jT  )  +  t  [  (W  +  L  -  f  -  k)  R  +  (j  +  2)  Tq]}  ,  t^O,  r^O  (70) 


where  we  have  eliminated  by  using  instead  0  or  —  Tq,  whichever  provided  an  upper  bound. 
Amending  these  results  with  one  expressed  in  Eq.  (30),  we  obtain 


L 

Q+  Y  (D-  1)DL‘*_1  exp{(W  +  L  —  l)  (|  fi2(2t)  +  tR]  +  tTo}  ,  t^O  (71) 

1  =  1 

L  «  L+W-f 

Q_<C  ZEE  (D-  1)DL'*_1  exp{k({  Hjtfr)  +  rR]  +  k  (  |  |x2(2t)  +  tR)} 
f  =  1  j=  0  k- 1 

x  exp{(W  +  L  -  t  -  k)  [|  |x2(2t)  +  tR]  +  jrTQ 

+  (j  +  2)  tTo}  ,  t  £0.  r^:  0  .  (72) 
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These  sums  may  be  carried  out  with  a  single  value  for  r  and  t,  but  a  better  result  is  achieved 
if  an  attempt  is  made  to  choose  values  closer  to  the  optimum  values  for  each  term  in  the  same 
manner  as  in  Sec.  III. 

We  consider  Q+  first.  Recalling  that 

PH)  =  {  (i2(2t)  +  tR  +  InD  [Eq.  (39)] 

and  letting  m  =  W  +  L  —  l ,  Eq.  (71)  can  be  rewritten 

W+L-l 

E  exp[m/?(t)  +  tTQ]  .  t*0  .  (73) 

D  m=W 


The  variable  t  is  a  function  of  m  and  should  be  chosen  according  to 

T 


(74) 


for  each  value  of  the  index.  However,  such  a  procedure  would  complicate  the  summations  un¬ 
necessarily.  Instead,  we  note  that  for  large  W,  and  we  are  chiefly  interested  in  the  exponential 
behavior  with  W,  TQ/m  in  Eq.  (74)  approaches  zero.  Hence  t  becomes  essentially  constant  and 


equal  to  t 


1* 


Thus  the  sum  can  be  carried  out  for  t  =  t,  to  obtain 

1 


Q+  <  ^ _ 1 

^  ^  D 


1  -  exp[L/9(t1)] 


1  -  exp [^(t  1>] 


t,T 


e1  °  exp{-W  (lnD-/?^)]} 


(75) 


Turning  now  to  Q  ,  we  apply  a  fairly  loose  bounding  technique  for  the  sake  of  simplicity. 
We  remark,  however,  that  the  two-value  method  used  in  all  previous  calculations  could  be  ap¬ 
plied  instead,  but  owing  to  the  triple  sum  to  be  performed  and  the  fact  that  the  index  at  which 
the  approximation  changes  can  fall  outside  the  summation  limits  as  well  as  inside,  the  result 
rapidly  becomes  cumbersome. 

Recalling  that 


a  (r)  =  j  P1(2r)  +  rR 


(Eq.  (40)] 


and 

Pit)  =  \  p2(2t)  +  tR  +  InD  [Eq.  (39)] 

and  letting  m  =  L  —  l ,  we  rewrite  the  bound  to  Q~  of  Eq.  (72), 

00  L-l  W+m 

Q’<:  —^+1  EE  E  exp  [ka  (r)  +  jrTQ  +  (W  +  m)  pit)  +  (j  +  2)  tTQ]  (76) 

^  j=0  m=0  k=  1 

for  t  ^  0  and  r  ^  0. 

Noting  first  that  the  choice  of  r  depends  only  on  k  and  j  while  the  choice  of  t  depends  on 
m  and  j,  we  consider  choosing  both  these  parameters  for  a  fixed  j.  The  optimum  value  of  r 
could  be  chosen  for  each  value  of  k  and  j  according  to 

T 

cr'(r)  =  -j-jf  .  (77a) 
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However,  this  would  make  the  analysis  very  complex.  Instead,  we  choose  a  single  value  of  r 
and  optimize  the  result  over  this  parameter.  Similarly,  a  different  t  could  be  chosen  for  each 
value  of  m  and  j  according  to 

-(J  +  2)  T 

aim  =  - -  (77b) 

P  K  ’  W  +  m 

Again  this  leads  to  unmanageable  detail.  We  note  that  the  exponent  proportional  to  t  in  Eq.  (76) 
grows  without  bound  as  the  sum  on  j  proceeds.  Therefore,  we  choose  one  value  of  t  in  terms 
with  a  small  value  of  j  and  set  t  =  0  for  the  remaining  terms. 

We  now  note  that  when  a(r)  is  negative,  the  dominant  term  in  the  sum  on  k  is  that  corre¬ 
sponding  to  k  =  1  and  that  the  dominant  term  in  the  sum  on  m  is  that  corresponding  to  m  =  0. 
Thus  Eq.  (76)  can  be  bounded  to  obtain 


Q  ^  DW+1 


oo  L- 1 

D_1  Z  Z  (W  +  m)  exp  [a  (r)  +  jrT  +  W/J(t)  +  (j  +  2)  tT  ] 


j=0  m-0 


=  L(2W  +  LWh11)  (P - -  Z  exp[a(r)  +  jrTo  +  W^(t)  +  (j  +  2)  tTo]  . 

20  j=0 


(78a) 


Thus  the  exponential  behavior  with  respect  to  the  tail  length  W  is  controlled  by  /3(t)  alone. 
Since  /3(t)  is  a  minimum  for  t  =  t^,  we  choose  t  according  to 


t  = 


t .  j  <  j 

1  j  jQ 


J  >1 


(78b) 


Therefore,  from  Eqs.  (78a)  and  (78b), 


'V1 


Q'^ 


L  (2W  +  L  —  1)  (D  -  1) 


2D 


,W+1 


Z  exp  [a  (r)  +  jrTQ  +  W/J^)  +  (j  +  2) 
J=° 


+  Z  exp[a(r)  +  jrTQ  +  W  InD) 


(79) 


We  change  from  t  =  t^  to  t  =  Oat  the  term  for  which  the  second  value  of  t  gives  a  smaller 
value  than  the  first.  This  occurs  for 


W  [In  D  —  /3(t . )]  W  [In  D  —  /?(t . )] 

-  -2<J0<  - ^ - -  -1 


t  .T 
1  o 


t,  T 
1  o 


(80) 


The  first  summation  in  the  braces  of  Eq.  (79)  can  be  bounded  by  the  product  of  the  number 
of  terms  and  the  largest  term.  If  this  bound  is  employed,  the  sign  of  the  sum  r  +  t^  delineates 
two  cases  which  must  be  considered  separately.  Thus,  from  Eq.  (79)  and  the  relationship 


Z 

j=;U 


Jr 


J  x 
xJ  =  — 


1  —  x 
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we  obtain  for  r  +  >  0 


n-  .  L(2 W  +  L  -  1)  (D-  1) 
W  ^  WTl 

2DW  +  1 


jQ  exp  [a  (r)  +  W/?^)  +  2t1T<>  +  jQ(r  +  tj)  TQ] 


exp[a(r)  +  W  In  D  +  jorTQ] 


rT 

1  —  e  0 

L(2W  +  L  -  1)  (D  -  1) 
2D 


exp  ja  (r)  +  -  rTQ  +  ^  [In  D  -  /3(t1)]J 

exp  Ja  (r)  -  2rTQ  +  [In  D  -  0(^)1  J ' 


rT 


1  —  e 


for  any  r  0,  and  if  r  +  t  ^  ^  0 


L(2W  +  L  -  1)  (D  -  1)  /  . 
2D 


^jQ  exp{a(r)  +  2t1TQ  -  W  [InD  -  /?(t1)]} 
expja (r)  -  2rTQ  +  [InD  -  /3 (t 4 ) ] J ' 


rT 


1  —  e 


(81) 


(82) 


for  any  r^:  0. 

In  the  bound  to  Q  ,  the  chief  interest  is  the  part  of  the  exponent  proportional  to  W,  the 
length  of  the  tail.  When  r  +  t^  >  0,  the  coefficient  of  W  is  given  by 

-f-  [In  D  —  /3(t. )] 
ll 


but  when  r  +  t^  <  0,  the  bound  has  two  terms  each  with  a  different  coefficient.  In  this  case, 
however,  r/t^  <  —  1  so  that  the  coefficient 

[InD -0(^)1 


is  the  dominant  one. 

The  choice  of  r  must  now  be  made.  Since  we  required  a  (r)  to  be  nonpositive  in  the  bounding 
process,  and  since  the  best  exponent  is  obtained  when  r  is  as  negative  as  possible,  we  choose 
r  =  r  .  With  these  bounds  on  Q+  and  Q  ,  we  can  proceed  to  the  final  step. 

The  bounds  to  Q+  and  Q  ,  when  summed,  give  a  bound  to  P  ,  the  probability  of  reaching  the 
end  of  the  tree  on  a  path  other  than  the  correct  one.  Because  of  the  complexity  of  the  expres¬ 
sions,  we  cannot  discuss  them  in  general.  For  Gaussian  noise,  however,  we  can  plot  the  ex¬ 
ponent  as  a  function  of  the  various  parameters,  and  then  discuss  its  behavior  for  this  important 
case. 


E.  Gaussian  Noise 

Using  the  moment  generating  functions  found  under  the  differential  bias  assumption  in 
Sec.  Ill,  we  can  consider  in  detail  the  error  probability  for  Gaussian  noise.  If  the  expressions 
are  examined,  it  becomes  clear  that  decreases  exponentially  with  W,  the  length  of  the  tail 
beyond  the  last  node  of  the  tree.  This  is  due  to  the  fact  that  at  earlier  depths,  the  number  of 
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Fig.  12.  Exponent  of  Q  vs  bias  constant. 

alternatives  is  growing  exponentially.  Thus  the  additional  data  points  which  become  available 
as  the  process  continues  do  not  contribute  to  lowering  the  error  probability. 

Since  P  can  be  made  as  small  as  desired  by  increasing  W,  we  plot  the  coefficient  of  W  as 

°  2  2 
a  function  of  the  bias  constant  for  various  values  of  the  ratio  6  /v  .  The  part  of  P  due  to  the 

2  2  ^ 

first  type  of  error  is  plotted  in  Fig.  12.  For  a  fixed  value  of  6  /cr  ,  the  exponent  becomes  more 

negative  as  the  bias  constant  decreases.  This  is  due  to  the  fact  that  for  a  larger  value  of  the 

bias  constant,  it  is  more  likely  that  the  metric  on  an  incorrect  path  will  remain  above  the  refer- 

2  2 

ence  metric  value  for  the  entire  tree  duration.  As  6  /v  increases,  the  whole  curve  shifts  to 
more  negative  values. 

The  remaining  portion  of  Pg  due  to  errors  of  the  second  kind  has  a  peaked  behavior  and  is 
plotted  in  Fig.  13.  In  the  events  leading  to  errors  of  the  second  kind,  the  joint  behavior  of  the 
metric  on  the  correct  path  and  on  incorrect  paths  is  involved.  Since  the  probability  that  an  in¬ 
correct  path  remains  above  a  particular  threshold  increases  with  increasing  bias  constant  and 
the  probability  that  the  correct  path  falls  below  the  threshold  decreases  with  increasing  bias 
constant,  there  are  regions  in  which  each  situation  dominates.  Thus  there  is  a  best  error  ex¬ 
ponent  for  the  second  type  of  error  at  an  intermediate  value  of  the  bias  constant.  We  note  that 
the  exponent  for  large  bias  constant  is  the  same  for  errors  of  the  first  kind  as  for  the  second 
kind,  and  that  for  small  bias  constant,  errors  of  the  second  kind  predominate.  Hence  the  curves 
of  Fig.  13  also  display  the  behavior  of  the  total  error  exponent. 

Finally,  the  exponent  for  the  optimum  value  of  the  bias  constant  is  plotted  in  Fig.  14.  It  is 
seen  to  have  the  usual  behavior  for  exponents  of  this  type. 

F.  Probability  of  First  Error 

Of  alternate  interest  in  many  measurement  problems  is  the  probability  of  making  a  first 
error,  rather  than  the  probability  of  making  any  error  at  all.  However,  it  is  clear  from 
Sec.  IV-B  that  the  probability  of  making  an  error  at  depth  i ,  is  upper  bounded  according  to 

p*«Qi+  +q; 
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Fig.  13.  Exponent  of  Q  and  vs  bias  constant. 


Fig.  14.  Optimum  exponent  of  vs  a  /S  . 
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where  Q  and  Q  are  given  by  Eqs.  (64)  and  (66).  Thus  the  summand  of  Eq.  (71)  and  the  sum  on 

*  ^  +  _ 
j  and  k  in  Eq.  (72)  provide  the  desired  bound  to  and  ,  respectively. 

Thus,  after  a  rearranging  of  terms  and  setting  t  =  t^, 

tT 

°  exp[-W  {In  D- 0(^)1  +  jSfl^)}]  .  (83) 

Since  (t ^ )  is  negative,  we  see  that  the  exponent  is  better  if  we  consider  the  location  of  the  first 
error  closer  to  the  origin  of  the  tree. 

Similarly,  we  consider  .  From  Eq.  (7  6), 

00  W+L-f 

Q,\<  V  D'W  2  E  exp[ka(r)  +  jrTQ+  (W  +  L-l)  jS(t)  +  (j  +  2)  tTQ]  .  (84) 
j=0  k=l 

Again  we  note  that  for  a  (r)  ^  0,  the  dominant  term  in  the  sum  on  k  is  that  corresponding  to 
k  =  1.  Hence 


Q~4  V  D"W  £  (W  +  L  -  l)  exp  [a  (r)  +  jrTQ  +  (W  +  L  -  l )  j8(t)  +  (j  +  2)  tTj  .  (85) 
j=0 

Since  the  exponential  behavior  is  again  controlled  by  j3(t),  we  can  use  the  same  rationale 
for  choosing  t.  Thus  let 


t  = 


t .  j  <  j 

1  j  jQ 


0  j  ^jr 


[Eq.  (78b)] 


Thus,  from  Eqs.  (85)  and  (78b), 


r\  ~  ^  (D  —  1)  (W  +  L.  —  f (r) 
^  ^W+l  e 


j  - 1 
Jo 


E  exp[jrTo  +  (W  +  L-  l)  /3 (t 4 )  +  (j  +  2)  tjTj 
j=° 


+  E  exP  UrT0  +  (W  +  L  —  l )  In  D] 


(86) 


We  choose  j  according  to 


(W  +  L  -  i)  [InD  -  0(tj]  (W  +  L  —  i)  [InD  -  p(t  )] 

—  -  2  <  j  x:  - — - —  -  1 

jq  \ 


t  T 
1  o 


t.T 
1  o 


(87) 


Consequently  for  r  +  t  >  0,  we  obtain  from  Eqs.  (86)  and  (87), 


(D  -  1)  (W  +  L  -  f )  a( r) 


D 


W+l 


jo  exp  [  (W  +  L  -  i)  fWJ  +  2t1To  +  (jQ  -  1)  (r  +  t^  Tq] 


exp[(W  +  L  —  t )  In  D  +  jorTQ] 


rT 


1  —  e 
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Q~4  (W  +  L-l)  ea(r) 


L>  -  l 

W 

L  -  t 
W 


/9+  T  xWr 
l  exp  (  2t  ,T  +  -7 — 

Jo  1  o 

[(>  *  ^InD-fSlt,)]  )*e»:p(-ZrTot  'f- 

[('*t)i"D-«‘,)1J) 


In  D  —  /3(t  d) 


for  any  r  <  0. 


If,  on  the  other  hand,  r  +  t^  0,  we  obtain 


^  (D  -  1)  (W  +  L-l)  na  (r) 

- XT7-7-. -  e 
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FT 


jo  exp [ (W  +  L-l)  0(t4)  +  2t1To] 


exp  [  (W  +  L  -  l )  In  D  +  jorTo] 


rT 
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«  2_i  (W  +  L-l)  ea(r) 


jQ  exp  {— W  [In D  —  /3(t1 )  —  0(^)1} 


(88) 


exp 


(“2rTo+Tf  jlnD-^t,)  +  V1  K1  +t)  lnD-^(tl)]|) 


rT 


1  —  e 


.  (89) 


The  exponent  in  these  expressions  can  now  be  examined.  When  r  +  t^  >  0,  the  coefficient 
of  W  is  identical  in  the  two  terms  and  is  given  by 

i;  [lnD-/»(t1)+  ^  [(l+  vJlnD-^t,)]  . 

When  r  +  t^^  0,  there  are  two  different  exponents  but  it  is  clear  that  the  more  positive,  and 
consequently  the  dominant,  one  is 

-[lnD-/^)  -  . 

The  choice  of  r  can  be  made  as  in  Sec.  IV-D,  r  =  r  . 

+  ° 

Turning  at  last  to  ^  ,  we  note  that  for  some  cases  the  dominant  exponent  is  given 

by  the  bound  to  and  that  in  others  it  is  the  same  for  Q  *  and  .  Thus  we  need  consider  only 
the  exponent  in  the  bound  for  . 

In  these  bounds,  the  main  interest  lies  in  the  coefficient  of  W.  By  comparing  Eqs.  (88)  and 
(89)  with  (81)  and  (82),  it  is  clear  that  this  coefficient  in  the  expression  for  the  probability  of 
first  error  is  better  than  that  for  the  probability  of  any  error  by  a  term  that  grows  linearly  with 
the  distance  of  the  error  point  from  the  end  of  the  tree.  Thus  the  probability  of  an  error  at  a 
particular  point  depends  not  only  on  the  length  of  the  tail,  but  also  on  the  number  of  output  sam¬ 
ples  available  beyond  this  point. 
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G.  Summary 

The  error  probability  for  the  sequential  measurement  technique  on  a  finite  tree  is  bounded 
by  a  quantity  that  decreases  exponentially  with  W.  The  exponent  is  calculated  in  detail  for 
Gaussian  noise.  This  exponent  is  plotted  in  Fig.  14  along  with  a  similar  bound  for  a  correlation 
technique  discussed  in  Appendix  B.  As  is  evident  from  the  curves,  there  is  a  degradation  in 

error  probability  exponent  due  to  the  sequential  procedure^  but  in  the  cases  considered  here, 

2  2 

the  number  of  computations  is  the  critical  issue  and  for  reasonable  values  of  <5  /a  ,  the  sequen¬ 
tial  procedure  is  more  attractive  from  this  viewpoint. 

V.  SIMULATION 

A.  Introduction 

In  the  preceding  two  sections,  we  analyzed  the  sequential  algorithm  as  applied  to  measure¬ 
ment  problems  which  satisfy  two  requirements.  First,  we  required  that  a  tree  structure  exist 
and  second,  we  imposed  a  somewhat  abstract  assumption  describing  the  relations  among  the  hy¬ 
potheses  in  the  output  space.  This  assumption  was  referred  to  as  the  differential  bias  assumption. 
Under  these  assumptions,  we  demonstrated  that  the  sequential  algorithm  could  be  used  to  perform 
the  measurement  with  a  limited  number  of  computations  and  with  an  error  probability  that  de¬ 
creases  exponentially  with  the  number  of  observations  not  dependent  on  undetermined  parameters. 

However,  analysis  is  only  to  suggest  the  operational  characteristics  of  a  system;  in  order 
to  test  the  model,  to  verify  the  hypotheses,  and  to  suggest  avenues  for  further  analysis,  exper¬ 
iments  should  be  conducted.  With  this  view  in  mind,  an  experimental  "apparatus'1  in  the  form 
of  a  simulation  program  was  designed  and  assembled.  A  number  of  experiments  were  per¬ 
formed  and  the  resultant  data  indicated  that  the  mathematical  model  was  a  satisfactory  repre¬ 
sentation  of  the  experimental  model.  In  this  section,  we  describe  in  detail  the  experiments  and 
the  results. 

B.  Simulation  Objectives 

There  were  several  specific  reasons  for  the  simulation.  In  the  first  place,  the  sequential 
algorithm  itself  is  fairly  complex  and  tracing  through  the  flow  chart  of  Fig.  5  manually  is,  at 
best,  tedious.  A  simulation  that  would  graphically  indicate  the  dynamics  of  the  algorithm  would 
do  much  to  aid  in  its  understanding  and  perhaps  to  suggest  methods  of  improvement. 

A  second  reason  for  the  simulation  was  to  test  the  various  assumptions  used  in  the  analysis. 
Although  the  differential  bias  assumption  specifies  conditions  under  which  the  sequential  meas¬ 
urement  technique  will  function  satisfactorily,  it  is  difficult  to  assess,  in  most  situations  of 
practical  interest,  whether  or  not  it  is  satisfied.  Of  course,  an  exhaustive  computational  analy¬ 
sis  could  be  employed  for  a  specific  measurement  situation,  but  this  would  produce  little  under¬ 
standing  of  the  general  class  of  problems  to  which  it  applies. 

In  addition,  although  the  differential  bias  assumption  is  sufficient  to  prove  that  the  sequen¬ 
tial  method  can  be  used  in  measurement  problems,  it  may  not  be  necessary.  That  is,  weaker 
requirements  on  the  differences  between  output  vectors  may  still  allow  the  sequential  method 
to  be  employed.  For  this  reason,  and  for  the  previous  one,  the  simulation  became  desirable. 

By  using  the  simulation,  we  could  ascertain  whether  or  not  the  sequential  method  could  be  ap¬ 
plied  to  a  particular  measurement  problem. 

t The  error  probability  for  sequential  measurement  is  lower  bounded  by  the  correlation  error  probability  for  the 
last  decision. 
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Finally,  we  recall  the  coarseness  of  the  bounds  used  to  obtain  the  theoretical  results.  A 
compilation  of  data  on  typical  problems  would  indicate  whether  these  approximations  left  the  es¬ 
sential  characteristics  of  the  bounded  quantities  intact. 

In  the  bulk  of  the  simulations  work,  the  measurement  problem  considered  was  that  of  geo¬ 
physical  exploration.  This  problem  was  chosen  for  simulation  because  of  a  need  for  improved 
data-processing  methods  in  that  area.  In  addition,  the  geophysical  exploration  problem  seemed 
typical  of  the  many  measurement  situations  in  which  it  is  difficult  to  assess  the  applicability  of 
the  algorithm.  The  model  described  in  Sec.  II  was  used,  since  it  displayed  the  essential  char¬ 
acteristics  of  the  real  problem  without  introducing  excessive  computational  difficulties. 

C.  Simulation  Program 

The  author  was  fortunate  to  have  the  opportunity  to  carry  out  the  simulation  on  a  time- 
13 

shared  IBM  7094  computer.  These  facilities  were  available  at  Project  MAC,  an  M.I.T.  re¬ 
search  group  directed  toward  improved  man-machine  communication.  Through  on-line  inter¬ 
action  with  the  computer,  it  was  possible  to  observe  directly  and  immediately  how  the  simulator 
was  operating,  and  to  modify  it  at  once  whenever  a  change  was  necessary.  More  important, 
however,  the  dynamics  of  the  decoder  became  readily  available,  thus  leading  to  a  significantly 
improved  understanding  of  the  decoder's  operation.  The  availability  of  a  graphic  display  unit 
made  the  dynamics  very  clear.  Examples  of  the  display  were  presented  in  Sec.  II  in  connection 
with  the  description  of  the  Fano  sequential  decoding  algorithm.  By  varying  the  decoder's  char¬ 
acteristic  constants  (R,Tq)  and  the  noise  variance,  one  could  observe  directly  the  effects  of  these 
variations  on  the  over-all  decoding  process.  Then  one  could  plan  intelligently  the  bulk  of  the 
off-line  experimental  work. 

The  simulation  program  is  divided  into  several  parts  as  indicated  in  Fig.  15,  according  to 
the  various  tasks  that  must  be  performed.  First,  all  parameters  are  set  to  their  initial  values. 
These  include  various  counters  to  tally  the  number  of  computations,  the  location  variable  which 
indicates  the  current  location  of  the  decoder,  the  choice  vector  i(n)  which  indicates  the  alternative 
chosen  by  the  decoder  at  each  node,  the  threshold  value,  and  the  metric.  In  addition,  the 


Fig.  15.  Main  sections  of  simulation  program. 
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noise-free  data  store  is  emptied.  Then  the  program  is  set  up  to  read  data  for  the  next  run.  The 
following  quantities  are  read  as  inputs: 

The  true  values  for  the  unknown  parameters. 

The  probe  signal. 

The  true  noise-free  data  sequence  resulting  from  the  true  parameters  and 
probe  signal.  Reading  this  in  saves  recomputation  when  the  same  noise-free 
data  are  to  be  retested. 

A  set  of  independent  Gaussian  noise  samples  of  zero  mean  and  unit  variance. 

The  set  of  quantization  levels. 

The  noise  variance. 

The  bias  constant. 

The  metric  increment. 

A  series  of  parameters  governing  the  frequency  of  output. 

After  these  parameters  are  read  in,  the  noise  is  scaled  according  to  the  variance.  It  is 
then  added  to  the  true  noise-free  data  sequence  to  provide  noisy  data  to  the  decoder.  The  de¬ 
coder  is  then  entered. 

The  decoder  operates  according  to  the  algorithm  of  the  flow  chart  discussed  in  Sec.  II.  At 
each  stage,  it  computes  an  increment  to  the  metric  according  to 

(i+1)  y-l 

d.  =  E  [C  -  (y.  -  z.)2]  (90) 

j=i>/ 

where  C  is  a  constant,  v  is  the  number  of  intervals  along  a  tree  branch,  y^  is  the  received  noisy 

data  at  time  j  and  z.  is  the  noise-free  output  consistent  with  the  current  hypothesis  and  the  probe 
J  th 

signal.  The  sum  is  over  all  those  intervals  depending  on  the  i  hypothesis,  but  not  on  the 

a  +  Dth. 

We  note  that  this  metric  is  of  the  form 


E  [R  +  ln pn(yj|zj)l 


for  Gaussian  noise.  For  in  that  case 


R  +  lnp  (y.  z.)  =  R  +  ln 
Fn  V  J 


- - —  exp  [—  (y.  -  z  )2/2<t2] 

\fZn  a  J  J 


=  — ^  lZaZn  -  CJZ  ln  Zn<rZ  -  (y.  -  z.)2] 
Z<j  J  J 


(91) 


Thus  Eqs.  (90)  and  (91)  are  proportional  if 
C  =  ct2(2R  -lnZn<JZ)  . 

Consequently,  the  requirement  that  R  be  greater  than  the  noise  entropy  introduced  in 
Sec.  III-H 


(92) 


R  >  H(N) 
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R  >  jr  In  ( Znecr Z) 

=  —  In  e  +  In  2tt(tZ 

=  2  +  2  lnZ7r(T 

reduces  to 

Off2  .  (93) 

2 

This  is  to  be  expected  since  C  —  a  is  the  expected  value  of  the  metric  on  the  correct  path  and 
the  operation  of  the  decoder  presupposes  this  to  be  positive. 

The  hypothesized  noise-free  output  values  {y^}  are  computed  according  to  a  subprogram 
which  is  changed  with  the  measurement  problem.  Usually  the  {y^}  are  time  consuming  to  com¬ 
pute.  It  is  therefore  desirable  to  store  them  until  it  no  longer  appears  that  they  will  be  needed. 
Then  they  can  be  discarded.  In  the  simulation  program  discussed  here,  a  list  structure  tech¬ 
nique  is  used  to  store  the  {y^}  in  a  manner  that  makes  their  recall  time  short  but  does  not  re¬ 
quire  rearrangement  of  data  in  the  store  when  items  are  to  be  discarded.  This  technique  is 
discussed  in  detail  in  Sec.  V-D. 

Finally,  provision  is  made  to  output  the  decoder's  conditions  at  a  selected  frequency  of  pas¬ 
sage  through  loop  A.  This  output  is  in  either  a  printed  form  giving  the  current  hypothesis  or  in 
an  oscillographic  form  displaying  the  metric  values  along  those  branches  investigated  by  the  de¬ 
coder.  Photographs  of  this  display  were  presented  in  Fig.  6  to  illustrate  the  operation  of  the 
algorithm. 

D.  Hypothesis  Storage 

Because  of  the  computation  time  necessary  in  many  measurement  problems  to  compute  the 
noise-free  output  resulting  from  a  particular  hypothesis  and  probe  signal,  it  is  worthwhile  to 
consider  techniques  of  storing  these  quantities.  A  satisfactory  method  must  permit  rapid  access 
and  small  bookkeeping  cost  with  respect  to  time  and  storage. 

Several  obvious  techniques  present  themselves.  First,  a  storage  location  could  be  provided 
for  each  possible  composite  hypothesis.  The  multidimensional  aspect  of  the  hypothesis  vector 
makes  this  procedure  absurd  because  of  the  huge  storage  required. 

Second,  a  storage  location  could  be  provided  for  all  hypotheses  having  a  common  first  part, 
but  differing  in  the  tail.  This  method  would  require  Dl  locations  if  the  tail  is  of  length  t.  Thus 
the  number  of  required  locations  remains  fixed,  as  the  decoder  advances  further  into  the  tree. 
However,  care  must  be  taken  in  the  design  of  the  storage  to  permit  rapid  access  to  the  informa¬ 
tion  and  to  avoid  excessive  time  spent  in  moving  the  data  within  the  store  as  the  decoder  advances. 

The  method  chosen  for  the  simulation  is  of  this  type,  but  once  the  method  is  described,  a 
third  method  can  be  suggested  which  permits  the  length  of  the  tail  to  vary  in  a  desirable  manner. 
Before  continuing,  it  is  necessary  to  say  a  few  words  about  list  structures. 

A  list  in  a  computer  is  a  group  of  storage  locations  which  are  tied  together  by  means  of 
secondary  locations  we  shall  refer  to  as  links.  These  links  contain  the  machine  addresses  of 
other  members  of  the  list.  Table  I  is  an  example  of  a  three-element  list.  Link  A  contains  the 
address  of  link  B,  and  link  B  contains  that  of  link  C,  etc.  N^,  N^,  and  N^,,  entries  on  the  list, 
are  thus  tied  together  by  the  links.  If  link  A  is  tagged  in  some  way  to  be  the  designated  first 
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TABLE  1 

A  SIMPLE  ORDERED  LIST 

Machine 

Address 

Symbolic  Name 

Contents 

4714 

Link  C 

000 

4715 

List  element  C 

Nc 

6102 

Link  A 

7452 

6103 

List  element  A 

na 

7452 

Link  B 

4714 

7453 

List  element  B 

nb 

element  of  the  list,  we  can  consider  the  list  in  the  table  as  an  ordered  list,  with  the  links  de¬ 
tailing  the  order  A,  B,  C. 

A  list  structure  develops  if  one  of  the  members  of  a  list  is  the  name  of  a  second  list,  known 
as  a  sublist  of  the  first.  The  process  may,  of  course,  continue  indefinitely,  with  sublists  on 
sublists  of  other  lists,  etc. 

With  these  preliminary  definitions,  we  can  describe  the  storage  technique  used  in  the  simu¬ 
lation  program.  Each  node  which  the  decoder  considers  is  stored  as  a  separate  ordered  list. 
When  the  node  list  is  first  created,  it  contains  2D  list  elements,  but  an  additional  element  is 
added  whenever  one  of  the  branches  leaving  this  node  is  tried  by  the  decoder.  Of  the  original 
list  elements,  the  odd-numbered  ones  contain  the  hypothesized  parameter  values  for  the  node 
and  the  even-numbered  ones  contain  the  noise-free  output  values.  Exactly  which  noise-free  out¬ 
put  values  are  contained  in  the  original  even-numbered  list  elements  will  become  apparent  when 
the  list  structure  is  indicated.  The  parameter  values  of  the  list  are  ordered  according  to  their 
likelihood  on  the  basis  of  the  data. 

Once  the  decoder  chooses  a  parameter  at  a  node,  it  moves  to  the  next  node  in  the  tree  and 
creates  a  new  list  for  it.  The  name  of  this  list  is  entered  on  the  list  corresponding  to  the  pre¬ 
vious  node,  two  entries  below  the  chosen  parameter  value.  Thus  the  main  list  has  a  tree  struc¬ 
ture  of  sublists,  each  corresponding  to  a  node  in  the  tree.  For  the  sublist  corresponding  to  a 
particular  node  to  be  reached,  one  must  start  at  the  main  list  and  then  proceed  further  to  those 
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(a)  Tree  structure. 


(b)  Corresponding  list  structure. 


Notes:  (1)  Z4  is  the  noise-free  output  resulting  from  alternative  1 
for  the  first  parameter  and  alternative  2  for  the  second 
parameter. 

(2)  At  node  D,  the  second  alternative  is  more  likely  than 
the  first. 


Fig.  16.  List  structure. 

successive  sublists  designated  by  the  list  names  in  the  list  structure.  The  structure  is  illus¬ 
trated  in  Fig.  16. 

To  recover  a  noise-free  output  value  from  the  list  structure,  one  proceeds  through  the  struc¬ 
ture  choosing  successive  lists  to  visit  on  the  basis  of  the  list  names  that  follow  the  hypothesized 
parameters  on  each  sublist.  The  list  element  below  the  hypothesized  parameter  on  the  list  cor¬ 
responding  to  the  node  of  interest  is  the  desired  output  value.  Although  the  list  structure  method 
of  storage  sounds  complex,  it  is  only  conceptually  more  involved  than  the  usual  methods.  The 
important  feature  is  that  a  storage  word  containing  the  machine  address  of  the  link  belonging  to 
the  current  hypothesis  needs  to  be  interrogated  in  order  to  determine  the  current  decoder  posi¬ 
tion  and  the  data  relevant  to  it.  This  storage  word  is  then  modified  according  to  the  information 
stored  in  the  links  as  the  decoder  progresses.  Because  of  the  tree-like  nature  of  the  list  struc¬ 
ture,  only  a  few  links  need  be  taken  to  reach  any  list  that  corresponds  to  a  node  of  interest  to 
the  decoder. 

When  one  decides  to  remove  a  node  from  the  store,  one  must  delete  only  the  corresponding 
list  name  from  lists  on  which  it  appears  and  then  inform  the  bookkeeper  that  the  extra  list  ele¬ 
ments  used  in  the  list’s  formation  are  now  available  for  other  lists  yet  to  be  generated. 

The  removal  of  an  entry  in  the  list  structure  is  governed,  in  the  simulation  program,  by  a 
simple  but  not  optimum  technique.  Once  the  decoder  reaches  a  fixed  depth,  say  t,  beyond  a 
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node,  all  superfluous  nodes  prior  to  t  nodes  from  the  end  of  the  tree  are  removed  from  the  struc¬ 
ture.  Hence  the  required  storage  remains  constant. 

We  note,  however,  that  only  a  few  of  the  D*  hypotheses  in  the  tail  will  be  tried;  therefore, 
keeping  D*  locations  available  in  storage  is  wasteful  of  space.  A  better  technique  would  consist 
of  keeping  the  storage  size  fixed  at  some  convenient  level,  and  then  removing  early  nodes  from 
the  store  when,  and  only  when,  an  overflow  occurs.  Hence  a  storage  size  of  D*  would  permit 
storage  of  nodes  at  depths  earlier  than  t  before  the  end  of  the  tree 

Coding  the  list  manipulations  was  greatly  simplified  by  the  use  of  the  Symmetric  List  Proc- 
14 

essor  language.  This  consists  of  a  set  of  FORTRAN  subroutines  which  automatically  establish 
the  links  and  extract  information  from  the  list  structure  as  well  as  performing  many  other  book¬ 
keeping  tasks.  The  reader  is  referred  to  Ref.  14  for  further  information  on  the  system. 

With  these  comments  on  the  simulation  program,  we  proceed  to  a  discussion  of  the  simu¬ 
lation  itself. 

E.  Simulation  Experiments 

As  discussed  in  Sec.  II-E,  a  simplified  model  of  the  geophysical  exploration  problem  was 
chosen  to  provide  a  measurement  situation  for  testing  the  theoretical  results  by  simulation.  This 
model  is  described  in  that  section. 

In  the  simulation  itself,  the  true  impedance  levels  were  chosen  randomly  in  such  a  manner 
as  to  represent  layers  of  various  thicknesses.  That  is,  a  layer  four  units  thick  would  be  mod¬ 
eled  by  four  equal,  consecutive  impedance  values.  Thus  a  fairly  realistic  fit  could  be  made  to 
many  geological  situations  that  involved  only  two  materials. 

The  input  signal  was  chosen,  at  first,  to  be  a  random  sequence  of  pulses,  but  it  soon  became 
apparent  that  the  largest  possible  signal-to-noise  ratio  should  be  provided  to  make  the  first  es¬ 
timate  of  an  impedance  value  and  thus  a  single  pulse  of  maximum  available  energy  is  preferable. 
Only  if  there  is  a  peak  power  limitation  should  an  extended  input  be  used. 

For  the  given  true  parameter  set  and  the  input  signal,  it  is  possible  to  compute  the  true- 
observed  noise-free  data.  This  was  done  and  noise  was  added  from  a  set  of  Gaussian-distributed 
independent  random  samples.  The  noise  level  was  varied  by  scaling  a  set  of  unit  variance  sam¬ 
ples  by  the  standard  deviation.  Once  this  addition  was  performed,  a  set  of  observed  samples 
was  available  upon  which  the  decoder  could  operate. 

The  program  was  organized  to  perform  a  number  of  sequential  measurements  on  the  same 

2 

set  of  impedance  values  by  varying  for  each  the  noise  variance  <7  ,  the  bias  constant  C,  and  the 
threshold  increment  Tq.  In  addition,  the  number  of  times  the  decoder  passed  through  loop  A 
in  the  flow  chart  of  Fig.  5  was  tallied  to  permit  a  progress  report  at  any  specified  frequency. 
Provision  was  made  to  halt  the  measurement  after  a  fixed  number  of  passages  through  loop  A. 

As  we  shall  see  in  the  next  section,  this  possibility  for  termination  will  enter  into  the  results 
of  the  simulation  study. 

To  be  definite,  we  shall  define  an  experiment  by  using  the  simulator  as  an  attempt  to  decode 

all  unknown  impedance  values  in  a  geophysical  model  that  has  a  particular  set  of  true  quantized 

2 

impedance  values,  a  particular  probe  signal,  a  particular  set  of  noise  samples  of  variance  a  , 
and  particular  values  for  the  parameters  C  and  T  .  A  great  many  experiments  of  this  type  were 
conducted,  but  the  total  number  was,  of  course,  a  minute  fraction  of  those  possible.  The  choice 
of  which  experiments  to  perform  was  governed  to  a  great  degree  by  experience  gained  during  the 
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(a)  First  parameter  set. 


(b)  Second  parameter  set. 


(c)  Third  parameter  set. 


Fig.  17.  Number  of  computations  vs  bias  (simulated). 
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on-line  portion  of  the  experimentation  during  which  few  actual  data  were  obtained.  On  the  basis 
of  this  experience,  it  was  possible  to  choose  a  set  of  experiments  that  would  be  reasonable  in 
number  yet  meaningful  in  result. 

As  mentioned  earlier,  the  true  impedance  values  were  chosen  to  suggest  a  layered  structure. 
Three  different  sets  of  32  values  were  employed  in  the  experimentation,  each  displaying  a  dif¬ 
ferent  degree  of  randomness.  However,  in  choosing  the  values  themselves,  no  conscious  pattern 
was  used.  The  intent  was  to  represent  typical  geological  environments.  The  characteristics  of 
each  of  these  sets  will  be  discussed  in  more  detail  later. 

The  probe  signal  was  a  single  pulse  of  amplitude  5.567.  This  particular  value  arose  from 
a  desire  to  compare  the  single-pulse  results  with  those  obtained  for  a  sequence  of  31  unit  pulses 
with  good  autocorrelation  function.  Although  no  complete  data  were  taken,  the  improvement  ob¬ 
tained  by  using  the  single  pulse  was  immediately  apparent. 

The  noise  samples  were  obtained  from  a  Gaussian  pseudo-noise  generator  available  in  the 
1 5 

IBM  7094  library.  Only  a  few  sequences  of  noise  samples  were  used  in  the  tests,  and  the  noise 
level  was  varied  by  scaling  the  samples  according  to  the  standard  deviation.  This  procedure 
permitted  an  evaluation  of  the  effect  of  changing  a  parameter  of  the  decoding  process  without  con¬ 
cern  for  variation  in  the  noise  sequences,  and  without  using  the  alternative  of  Monte  Carlo  opera¬ 
tion  to  average  the  noise  effects.  In  general,  the  change  to  a  different  basic  sequence  of  noise 

samples  did  not  affect  the  data  significantly. 

2 

The  noise  variance  a  was  varied  a  great  deal  in  the  experiments.  It  is  important  to  note 
that  its  value  was  measured  with  respect  to  an  input  pulse  amplitude.  Thus,  when  one  considers 
to  what  degree  the  noise  obstructs  the  observations,  a  direct  comparison  of  the  noise  variance 
with  the  effect  under  scrutiny  is  necessary.  For  example,  if  one  is  trying  to  measure  an  effect 

that  appears  in  the  fourth  decimal  place,  it  would  be  difficult  to  observe  if  the  noise  had  a  vari- 

-8  -4 

ance  of  10  and  a  standard  deviation  of  10 

The  threshold  increment  Tq  was  varied  along  with  C,  the  bias  constant,  and  was  always 
set  equal  to  C/2.  From  the  on-line  experimentation,  it  was  evident  that  Tq  variations  did  not 
affect  the  results  significantly,  unless  it  was  chosen  too  small.  We  shall  see  some  data  in  sup¬ 
port  of  this  view  later. 

Finally,  the  bias  constant  C  was  varied  considerably.  Since  the  ratio  C/a  must  exceed 
unity  for  the  average  metric  value  along  the  correct  path  to  be  positive,  it  was  varied  from  that 
level  by  two  orders  of  magnitude.  We  shall  see  that  its  choice  was  important  in  determining  the 
outcome  of  an  experiment. 

2 

Thus  the  main  variations  in  the  experimentation  were  of  the  noise  variance  a  and  the  bias 
constant  C.  The  results  of  the  experiments  could  then  be  presented  as  a  set  of  experimental 
curves  of  the  same  form  as  those  derived  theoretically  in  Sec.  III. 

F.  Simulation  Results 

Figure  17(a-c)  presents  the  results  of  the  simulated  measurement.  Each  point  on  these 

curves  indicates  the  number  of  computations  required  to  estimate  a  complete  set  of  32  param- 

2 

eters  with  a  specified  noise  variance  a  ,  threshold  increment  T  ,  and  bias  constant  C.  All 
points  thus  plotted  in  each  curve  are  for  the  particular  set  of  32  quantized  impedance  values 
listed  in  Table  II.  As  becomes  clear  from  an  examination  of  the  figures,  the  simulated  curves 
are  similar  in  form  to  those  derived  theoretically,  but  vary  distinctly  among  themselves  in  the 
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TABLE  II 

PARAMETER  VALUES  FOR  SIMULATION 

Depth 

Parameter  Set  For 

Fig.  17(a) 

Fig.  17(b) 

Fig.  17(c) 

1 

1.0 

1.0 

1.0 

2 

1.0 

1.0 

0.1 

3 

1.0 

0.1 

1.0 

4 

0.1 

0. 1 

1.0 

5 

0.1 

0.1 

1.0 

6 

0.1 

0.1 

1.0 

7 

1.0 

1.0 

0.1 

8 

1.0 

0.1 

0.1 

9 

1.0 

0.1 

1.0 

10 

0.1 

0.1 

1.0 

11 

0.1 

1.0 

0.1 

12 

0.1 

1.0 

1.0 

13 

0.1 

1.0 

0.1 

14 

0.1 

1.0 

0.1 

15 

0.1 

1.0 

1.0 

16 

0.1 

1.0 

0.1 

17 

0.1 

0.1 

0.1 

18 

1.0 

0.1 

1.0 

19 

1.0 

1.0 

1.0 

20 

1.0 

1.0 

1.0 

21 

1.0 

1.0 

0.1 

22 

1.0 

1.0 

0.1 

23 

0.1 

1.0 

1.0 

24 

0.1 

0.1 

0.1 

25 

0.1 

0.1 

0.1 

26 

1.0 

1.0 

1.0 

27 

1.0 

1.0 

0.1 

28 

1.0 

0.1 

1.0 

29 

1.0 

0.1 

0.1 

30 

1.0 

0.1 

0.1 

31 

1.0 

0.1 

0.1 

32 

1.0 

1.0 

1.0 
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noise  variance  value  used  to  produce  each  separate  curve.  We  shall  see  that  this  is  due  to  the 
fact  that  a  distinctly  different  differential  bias  parameter  6  obtains  for  each  set  of  parameter 
values. 

We  note  that  there  are  four  types  of  points  on  these  curves.  First,  there  are  those  corre¬ 
sponding  to  a  correct  estimate  of  the  complete  parameter  set;  these  are  plotted  along  the  curves. 
Second,  there  are  those  corresponding  to  an  incorrect  estimate  of  one  or  more  parameters; 
these  are  plotted  at  the  top  of  the  graph.  The  third  and  fourth  types  arise  because  of  the  limit 
placed  on  the  number  of  computations.  If  the  decoder  performs  this  maximum  number  of  com¬ 
putations  and  tries  the  correct  set  of  parameters,  and  this  set  has  the  maximum  metric,  the 
decoder  has  in  effect  been  successful,  although  it  did  not  satisfy  its  internal  constraints.  Such 
a  point  is  plotted  at  N  =  200,  the  maximum  number  of  computations  permitted.  If  the  decoder 
does  not  try  the  correct  set  before  reaching  the  computational  limit,  an  error  is  made.  Such  a 
point  is  also  plotted  at  the  top  of  the  graph.  It  is  possible  to  have  a  fifth  type  of  point,  although 
this  did  not  occur  in  the  simulation.  If  the  decoder  tries  the  correct  parameter  set  as  well  as 
some  others  and  one  of  these  incorrect  sets  has  the  highest  metric,  the  decoder  will  err.  How¬ 
ever,  we  note  that  any  unbiased  estimation  procedure  will  also  err,  since  the  received  signal 
vector  is  no  longer  closest  to  the  correct  noise-free  vector  in  the  output  space. 

This  limitation  on  the  number  of  computations  was  not  the  only  special  condition  of  the  simu¬ 
lation.  In  all  our  previous  discussions,  we  assumed  that  the  decoder  operated  with  perfect  pre¬ 
cision.  Since  this  did  not  hold  true  in  practice,  there  were  several  instances  in  which  this  effect 
became  apparent.  Generally  speaking,  they  occurred  when  the  additive  noise  level  was  low  and 
the  bias  constant  was  also  set  at  a  low  level.  In  this  event,  the  metric  increments  on  an  incor¬ 
rect  branch  would  be  several  orders  of  magnitude  larger  than  the  total  metric  value.  Thus,  when 
an  incorrect  branch  was  tested,  the  threshold  would  be  violated.  When  the  decoder  tried  to  re¬ 
trace  its  steps  to  return  to  the  correct  path,  all  precision  in  the  metric  would  have  been  lost. 

This  difficulty  with  the  computer's  precision  arose  also  when  internally  calculated  values 
were  compared  with  the  same  values  that  have  been  transferred  through  the  computer's  input- 
output  facility.  Round-off  errors  brought  about  a  second  noise  source  that  proved  to  be  larger 
than  the  additive  noise  on  several  occasions. 

As  stated  earlier,  the  data  resulting  from  the  simulation  are  presented  in  Fig.  17(a-c). 

Some  general  comments  can  be  made  about  them.  In  the  first  place,  they  are  seen  to  have  the 
same  over-all  shape  as  the  theoretical  curves.  For  small  bias  constant,  the  number  of  compu¬ 
tations  is  large  because  the  correct  path  will  tend  to  have  negative  metric  increments  as  well 
as  the  incorrect  paths.  Thus  the  decoder  may  never  leave  the  correct  path,  but  it  will  repeatedly 
be  forced  back  to  the  origin  of  the  tree  by  increasingly  negative  metric  values.  For  large  bias 
constant,  the  incorrect  paths  will  appear  correct  for  several  branches  before  a  sufficient  num¬ 
ber  of  incorrect  branches  has  been  traversed  to  make  the  incorrect  path  have  a  very  negative 
metric  increment.  Since  the  decoder  will  have  to  modify  several  hypotheses  before  returning 
to  the  correct  path,  the  number  of  computations  to  rectify  the  error  will  be  large. 

An  examination  of  the  variance  values  on  each  curve  indicates  that  the  number  of  computa¬ 
tions  decreases  as  the  noise  level  decreases.  Indeed,  it  would  be  surprising  if  it  were  other¬ 
wise.  However,  we  note  that  the  particular  noise  variance  on  each  curve  is  different  for  each 
set  of  true  impedance  values.  In  the  next  section,  we  shall  see  that  this  is  due  to  a  marked  dif¬ 
ference  in  the  degree  of  dissimilarity  between  the  correct  output  vector  and  the  set  of  incorrect 
output  vectors. 
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Fig.  18.  Number  of  computations  vs  ratio  of  threshold  increment  to  bias  constant. 


2 

Fig.  19.  Estimated  8  vs  depth. 
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A  few  less  formal  results  were  obtained  from  the  simulation.  Although  an  attempt  was  made 
to  analyze  the  microscopic  behavior  of  the  decoder,  it  did  not  seem  useful  to  make  any  quantitative 
analyses;  however,  a  few  qualitative  remarks  are  in  order.  Generally,  the  performance  was  as 
expected.  As  the  bias  constant  was  increased  for  a  particular  noise  level,  the  number  of  com¬ 
putations  required  to  correct  a  given  error  grew.  This  was  due  to  the  tendency  of  the  metric  on 
incorrect  paths  to  increase  when  the  bias  is  large.  However,  the  number  required  to  terminate 
the  estimation  after  the  correct  hypothesis  was  tried  for  the  first  time  decreased  as  the  bias 
constant  was  increased.  This  was  due  to  the  fact  that  the  tail  includes  many  noise  samples  and 
this  tends  to  produce  a  negative  metric  even  on  the  correct  path.  As  the  bias  constant  becomes 
large,  this  tendency  lessens.  Finally  as  the  noise  level  increases,  the  decoder  makes  its  first 
error  at  an  earlier  depth  in  the  tree. 

Setting  the  threshold  increment  Tq  at  the  value  C/2  was  done  on  the  basis  of  evidence  ob¬ 
tained  from  preliminary  simulation  work.  Clearly,  it  should  not  be  chosen  too  small  or  else 
many  computations  would  be  necessary  to  lower  the  threshold  a  fixed  amount.  If  it  is  too  large, 
the  metric  values  on  incorrect  paths,  although  they  decrease,  will  not  fall  below  the  threshold 
soon  enough,  causing  incorrect  paths  to  be  searched  unnecessarily.  To  check  this  choice  of  Tq, 
a  series  of  runs  was  performed  for  different  values  of  the  ratio  Tq/C.  The  results  are  presented 
in  Fig.  18.  It  is  clear  that  the  choice  of  Tq/C  is  not  critical. 

G.  Discussion  of  Results 

Unfortunately,  it  is  not  possible  to  make  a  direct  quantitative  comparison  of  the  theoretical 
and  experimental  results.  This  cannot  be  expected  since,  in  the  theoretical  work,  it  was  as¬ 
sumed  that  the  bias  introduced  by  following  the  incorrect  path  was  independent  of  depth,  whereas 
in  the  simulated  geophysical  problem  this  bias  decreases  exponentially  with  depth  owing  to  the 
multiplicative  coupling  between  layers.  Nevertheless,  some  qualitative  comparisons  can  be 
made. 

We  have  already  noted  that  the  form  of  the  curves  obtained  by  simulation  are  similar  to 
those  obtained  theoretically.  We  also  remarked  on  the  differences  in  variance  required  to  pro¬ 
duce  a  set  of  curves  due  to  the  dissimilarities  in  true  impedance  values.  This  dissimilarity 
would  be  reflected  in  the  value  of  the  differential  bias  parameter  6,  if  it  could  be  computed. 
However,  as  noted  earlier,  computation  of  6  would  require  exhaustive  effort. 

A 

Fortunately,  a  simply  calculated  quantity  6  indicates  the  magnitude  of  the  effect  one  is  using 
to  originally  hypothesize  a  value  for  a  parameter.  This  quantity  is  the  value  obtained  by  calcu¬ 
lating  the  noise-free  output  for  each  alternative  at  the  nodes  along  the  correct  path,  and  then 
taking  the  difference  between  the  output  for  the  correct  alternative  and  that  for  the  incorrect  one. 
For  the  particular  parameter  sets  used  in  the  simulation,  this  calculation  was  performed.  The 
resuLt  as  a  function  of  depth  is  plotted  for  each  parameter  set  in  Fig.  19. 

We  can  now  compare  the  curves  obtained  by  the  simulation  with  those  obtained  theoretically. 
From  Fig.  19,  we  observe  that  the  parameter  sets  used  to  derive  the  curves  of  Fig.  17(a)  has  a 

A 

lower  6  than  those  used  to  derive  the  curves  of  Fig.  17(b),  and  therefore  should  be  less  difficult 
to  estimate.  Indeed,  that  was  the  case.  In  the  same  way,  the  parameter  set  used  for  Fig.  17(c) 
is  predicted  to  be  more  difficult  to  estimate  than  either  of  the  other  sets.  Again,  we  note  the 
agreement. 

An  estimate  of  the  degree  to  which  the  particular  parameter  sets  used  in  the  simulation  are 

A 

typical  can  be  obtained  by  computing  the  average  value  of  6  over  the  ensemble  of  parameter  sets 
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with  two  impedance  values  being  equally  likely.  This  computation  is  given  in  Appendix  D  and  the 
result  for  Z^/Z^  =  10  is  plotted  in  Fig.  19,  along  with  the  result  for  the  specific  curves.  We  see 
that  the  sets  used  were  both  better  and  worse  than  the  average. 

The  curves  describing  the  results  of  the  simulation  only  indicate  the  experiments  that  were 
performed  with  a  small  enough  noise  variance  so  that  the  decoder  would  be  successful  in  correctly 
estimating  all  32  unknown  impedances  for  some  value  of  the  bias  constant.  Other  experiments 
were  also  carried  out  at  higher  noise  levels  at  which  the  decoder  could  not  successfully  hypothe¬ 
size  the  32  values  within  the  200  computations  allowed.  In  fact,  as  the  noise  level  increased, 
the  number  of  required  computations  grew  more  and  more  rapidly.  Thus  we  see  experimental 
evidence  of  the  existence  of  a  quantity  analogous  to  Rcomp>  a  rate  at  which  the  number  of  com¬ 
putations  in  communications  grows  without  bound.  This  quantity  is  particularly  important  in 
connection  with  quantization  effects,  and  therefore  will  be  discussed  further  in  Sec.  VI. 

In  addition  to  the  results  obtained  on  the  decoder's  performance,  both  quantitatively  and 
qualitatively,  some  insight  into  the  geophysical  problem  was  obtained.  We  shall  discuss  this 
understanding,  as  well  as  the  effects  of  using  the  sequential  algorithm  on  a  nonquantized  problem, 
in  the  next  section. 

H.  Summary 

The  results  of  the  simulation  have  borne  out  the  theoretical  results  insofar  as  the  general 
behavior  of  the  number  of  computations  vs  the  bias  constant  is  concerned.  A  direct  comparison 
of  results  is  difficult  because  the  decreasing  amplitude  of  the  effect  depends  on  the  unknown 
parameters  in  the  simulated  case.  In  view  of  the  over-all  character,  however,  it  seems  safe 
to  say  that  the  assumptions  were  reasonable  and  that  the  sequential  measurement  procedure  is 
satisfactory  on  this  simplified  model  of  the  geophysical  layering  problem. 

VI.  QUANTIZATION  EFFECTS 

A.  Introduction 

In  this  section,  we  consider  briefly  the  problems  arising  from  the  quantization  of  the  unknown 
parameters.  We  shall  see  that  there  is  an  upper  limit  to  the  precision  obtainable  with  the  se¬ 
quential  technique  which  may  be  below  that  obtainable  with  some  other  method.  In  addition,  a 
masking  noise  arises  which  must  be  considered  along  with  the  additive  noise  in  determining  the 
total  noise  level. 

B.  Computational  Cutoff 

In  the  hypothesis  testing  done  by  the  sequential  algorithm,  the  differential  bias  parameter 
6  specified  to  what  degree  the  various  alternatives  at  a  node  affected  the  noise-free  output  vec¬ 
tor.  If  these  alternatives  represent  a  set  of  quantization  steps  for  a  continuous  parameter,  the 
magnitude  of  6  is  a  measure  of  the  effect  produced  at  the  output  by  a  change  of  one  quantization 
step. 

The  magnitude  of  6  is  determined  partly  by  the  signal  energy,  partly  by  the  transformation 
introduced  by  the  transducer  being  measured,  and  partly  by  the  size  of  the  quantization  steps. 

For  a  fixed  available  energy,  the  only  one  of  these  items  which  can  be  varied  by  the  observer 
is  the  size  of  the  quantization  steps. 

It  is  important  to  note  that  the  ratio  of  the  available  energy  to  the  receiver  noise  level  is  not 
sufficient  to  determine  the  precision.  In  particular,  one  must  account  for  distortions  which  the 
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signal  must  undergo  in  the  transducer  after  it  "picks  up"  information  about  the  unknown  param¬ 
eter,  but  before  it  can  be  observed.  If,  for  example,  this  distortion  were  a  saturation  effecf,  a 
large  change  in  the  unknown  parameter  would  be  necessary  to  effect  a  small  change  in  the  trans¬ 
ducer  output.  Thus,  even  though  both  the  input  and  output  energies,  relative  to  the  noise  level, 
were  large,  those  details  in  the  output  needed  to  determine  precisely  the  unknown  parameter 
would  be  lost  in  the  compression.  Consequently,  one  must  consider  transducer  effects,  as  well 
as  energy  and  noise  level,  in  determining  the  precision  that  is  possible. 

We  saw  in  both  the  theoretical  and  simulation  work  that  when  6  was  too  small  relative  to 
the  noise,  the  decoder  no  longer  efficiently  chose  the  correct  hypothesis  set.  Instead,  it  effec¬ 
tively  began  an  exhaustive  search  of  all  possible  hypotheses.  Thus  there  was  a  critical  value  of 
the  ratio  6 /a  below  which  the  decoder  was  ineffective.  For  fixed  available  energy,  noise  level, 
and  transducer,  this  implies  that  there  is  a  critical  quantization  step  size  below  which  the  se¬ 
quential  method  cannot  be  used.  Since  the  size  of  the  quantization  steps  indicates  the  precision 
of  the  measured  parameter,  the  noise  level,  the  available  energy  and  the  transducer  all  contrib¬ 
ute  to  a  maximum  degree  of  precision  that  can  be  obtained. 

It  is  informative  to  compare  this  limit  with  the  corresponding  limit  in  the  communications 
case.  Sequential  decoding  was  found  to  be  an  effective  and  efficient  decoding  technique,  as  long 
as  a  particular  rate  ^comp  was  not  exceeded.  If  communication  at  a  higher  rate  was  tried,  the 
frequency  of  lengthy  searches  became  so  high  that  the  average  number  of  computations  began  to 
grow  rapidly  with  constraint  length. 

If  we  now  note  that  the  precision  of  a  parameter  is  the  amount  of  information  needed  to  spec¬ 
ify  it,  we  see  that  the  precision  obtained  from  a  measurement  is  analogous  to  the  rate  of  trans¬ 
mission  in  communications.  Thus  the  critical  size  of  the  quantization  steps  in  measurements 
and  Rcomp  are  analogous  quantities.  In  addition,  we  note  that  it  may  be  possible,  by  using  an 
exhaustive  nonsequential  search  procedure,  to  measure  the  unknown  parameters  to  a  higher  de¬ 
gree  of  precision  than  is  possible  with  a  sequential  method.  This  only  means  that  the  critical 
"rate"  is  below  the  maximum  rate  or  channel  capacity  imposed  by  the  available  energy,  the  noise 
level,  and  the  tranducer  characteristics. 

C.  Masking  Noise 

In  the  preceding  section,  we  observed  that  the  available  energy,  the  noise  level,  and  the 
transducer  set  an  upper  limit  to  the  degree  of  precision  that  can  be  obtained.  The  resultant  im¬ 
perfect  precision  leads  to  an  effect  which  we  shall  refer  to  as  masking  noise. 

th 

When  the  sequential  measurement  technique  is  used,  the  k  n  hypothesis  is  made  on  the  basis 

of  a  quantity  which  was  derived  from  the  observed  data  vector  and  the  set  of  k  —  1  hypotheses 

t  h 

that  has  already  been  made.  Define  this  quantity  as  the  reduced  data  point  for  the  k  hypothe¬ 
sis.  Because  of  the  lack  of  precision  in  estimating  the  first  k  —  1  unknown  parameters,  it  will 

th 

not  be  possible  to  compute  exactly  the  reduced  data  point  for  the  k  1  hypothesis.  The  imprecision 
that  results  will  be  defined  as  the  masking  noise  and  must  be  considered  with  the  additive  noise 
when  evaluating  the  noise  level.  If  the  precision  is  sufficiently  high  in  estimating  the  first  k  —  1 
parameters,  the  masking  noise  will  be  small  and  will  be  dominated  by  the  additive  noise.  If  the 
precision  is  low,  the  masking  noise  will  be  the  dominant  problem.  Thus  the  precision  with  which 
early  estimates  are  made  affects  the  error  probability  and  number  of  computations  for  later  es¬ 
timates  through  the  masking  noise  level. 
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We  note  that  the  masking  noise  is  highly  structured,  since  many  dependencies  exist  among 
samples.  Thus  it  cannot  be  considered  simply  as  an  increase  in  the  additive  noise  level. 

D.  Precision  in  Geophysical  Problem 

We  have  noted  in  the  preceding  section  how  the  precision  at  one  depth  in  the  decoding  tree 
will  affect  the  decoder  in  making  estimates  at  later  depths.  We  also  noted  in  Sec.  V  that  the  out¬ 
put  effect  of  varying  an  unknown  parameter  is  an  exponentially  decreasing  function  of  the  number 
of  previous  discontinuities.  In  this  section,  we  discuss  the  results  of  these  effects  in  connection 
with  the  geophysical  problem  considered  by  simulation  in  Sec.  V. 

The  limitation  in  precision  discussed  in  Sec.  VI-B  led  to  the  definition  of  a  minimum  value 
of  the  ratio  6/a  for  which  the  sequential  procedure  could  be  used.  Since  this  ratio  decreases 
exponentially  with  depth  in  the  geophysical  layering  problem,  the  precision  which  can  be  obtained 
(at  a  fixed  noise  level)  decreases  with  depth  as  well.  Thus  the  number  of  quantization  levels 
should  be  reduced  as  one  proceeds  to  deeper  levels  in  the  tree. 

The  decreasing  precision  with  which  the  impedance  value  of  an  increasingly  deep  layer  can 
be  measured  depends  not  only  on  the  decreasing  6/a  ratio,  but  also  on  the  increasing  masking 
noise  level  that  arises.  As  indicated  in  the  preceding  section,  the  masking  noise  level  increases 
as  one  measures  more  and  more  parameters.  Thus  the  masking  noise  level  increases  with  depth 
in  the  geophysical  layering  problem.  For  this  reason,  one  should  quantize  to  as  many  levels  as 
the  6/a  ratio  permits.  Then  the  masking  noise  will  be  reduced  as  much  as  possible  for  later 
hypotheses. 

Consequently,  because  of  the  decreasing  6/a  ratio  and  the  increasing  masking  noise,  we 
see  that  the  number  of  quantization  levels  should  be  decreased  with  depth,  choosing  the  number 
at  each  depth  as  small  as  the  6/a  ratio  permits.  Thus  we  will  determine  the  unknown  parameters 
with  a  degree  of  precision  that  decreases  with  increasing  depth. 

VII.  SUMMARY  AND  RECOMMENDATIONS 
A.  Summary 

In  this  report,  the  applicability  of  a  sequential  measurement  technique  to  a  fairly  broad 
class  of  problems  was  considered  and  was  analyzed  both  theoretically  and  experimentally  by  com¬ 
puter  simulation.  Necessary  conditions  were  determined  under  which  the  sequential  procedure 
could  be  successfully  operated  with  a  limited  number  of  computations,  and  with  an  error  prob¬ 
ability  that  decreased  exponentially  with  the  number  of  observations  that  can  be  made  after  the 
last  hypothesis.  A  parameter  was  defined  which  could  be  used  to  characterize  a  particular 
measurement  problem  and  in  terms  of  which  the  performance  could  be  estimated.  In  Secs.  Ill 
and  IV,  curves  were  derived  to  indicate  upper  bounds  to  the  level  of  this  performance. 

Since  the  value  of  the  performance  parameters  is  frequently  difficult  to  determine  and  since 
many  approximations  were  used  in  obtaining  the  theoretical  results,  it  seemed  desirable  to  sim¬ 
ulate  the  sequential  measurement  algorithm  that  operates  on  a  measurement  problem  of  practical 
interest.  Such  a  simulation  was  performed  on  a  geophysical  exploration  model.  From  the  sim¬ 
ulation,  it  was  possible  to  obtain  curves  of  the  same  variables  that  were  obtained  theoretically, 
and  thereby  to  compare  the  simulated  results  with  those  calculated.  The  comparison  seemed 
to  be  a  favorable  one.  The  curves  obtained  by  experiment  were  of  the  same  general  form  as 
those  obtained  from  the  theory  and  by  estimating  the  performance -dictating  parameter  mentioned 
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above,  it  was  possible  to  make  a  slightly  more  specific  comparison.  Again,  there  seems  to  be 
good  agreement  between  experiment  and  theory. 

The  shortcomings  of  this  study  seem  to  lie  mainly  in  the  reality  of  the  model.  We  require 
a  situation  in  which  the  successive  data  points  are  a  function  of  an  increasing  number  of  unknown 
parameters  and  one  in  which  the  separation  between  possible  parameter  values,  as  viewed  from 
the  output,  is  clear.  However,  a  preliminary  investigation  of  the  geophysical  exploration  problem 
through  computation  and  simulation  indicates  that  it  satisfies  these  conditions.  Thus  the  set  of 
problems  amenable  to  solution  by  the  sequential  measurement  algorithm  considered  in  this  re¬ 
port  seems  to  have  at  least  one  member.  A  detailed  study  of  other  measurement  problems 
in  order  to  formulate  them  into  tree-like  representation  would  be  necessary  for  additional 
applications. 

B.  Suggestions  for  Further  Research 

Four  research  problems  seem  to  follow  as  natural  consequences  of  this  work.  In  the  first 
place,  the  applicability  of  the  model  introduced  here  for  the  geophysical  exploration  problem 
must  be  considered  in  greater  detail.  Such  consideration  would  undoubtedly  involve  simulation 
with  actual  seismic  data  as  recorded  under  field  conditions,  instead  of  the  highly  idealized  data 
used  in  the  simulation  discussed  above.  Indeed,  the  simulator  would  require  additional  sophis¬ 
tication  to  account  for  the  many  seismic  records  obtained  from  the  usual  array  of  geophones  and 
to  include  a  priori  information  about  the  geological  structure  obtained  from  scattered  drillings. 
Only  when  a  complete  simulation  of  this  type  is  attempted  would  the  applicability  of  the  sequen¬ 
tial  method  be  ascertained. 

The  second  area  for  further  work  lies  in  increasing  the  number  of  problems  to  which  the 
algorithm  applies.  Indeed,  there  are  many  multidimensional  parameter  estimation  problems 
of  large  proportions  that  are  unassailable  with  the  currently  used  hill-climbing  techniques.  If 
such  a  problem  could  be  stated  so  that  a  tree  structure  becomes  evident,  it  may  well  be  possible 
that  the  sequential  technique  would  be  applicable. 

Third,  the  possibility  of  a  form  of  feedback  can  be  noted.  When  the  sequential  algorithm  is 
having  difficulty,  the  difficulty  is  readily  apparent.  Thus  the  observer  could  stop  the  processing 
and  rerun  the  experiment  with  new  data  or  he  could  vary  the  parameters  of  the  algorithm.  Un¬ 
like  the  communication  problem,  there  is  no  continual  data  stream  being  received.  Thus  no 
storage  problem  exists  and  the  processing  could  be  performed  in  nonreal  time.  Under  such 
conditions,  flexibility  in  modifying  the  algorithm  as  it  operates  is  available  and  this  freedom 
could  be  used  to  advantage. 

Finally,  it  appears  that  a  modification  to  the  algorithm  discussed  here  should  be  possible 
to  permit  specific  consideration  of  parameters  with  continuous  a  priori  distributions.  If  this 
distribution  is  known,  and  the  noise  distribution  is  also  known,  it  is  possible  to  measure  the 
degree  to  which  a  set  of  estimates,  as  a  whole,  agrees  with  the  data.  Thus,  if  one  incrementally 
picks  the  optimum  value  for  a  parameter  in  a  sequential  procedure,  he  may  not  be  selecting  the 
same  value  he  would  obtain  by  a  joint  estimation  procedure.  This  notion  suggests  a  coarse  es¬ 
timate  with  the  incremental,  sequential  method,  followed  by  a  variational  correction  at  a  later 
stage,  if  the  coarse  estimate  appears  correct.  An  explicit  technique  for  such  a  procedure,  as 
well  as  its  analysis,  is  outside  the  scope  of  this  research.  However,  intuition  gained  from 
dealing  with  the  sequential  techniques  suggests  that  a  modification  of  this  type  would  be  possible 
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and  that  its  implementation  would  substantially  extend  the  scope  of  the  problems  amenable  to  a 
sequential  measurement  algorithm. 

C.  Conclusions 

We  have  introduced  a  measurement  technique  that  was  suggested  by  the  sequential  decoding 
procedure  for  convolutionally  encoded  messages.  This  method  was  analyzed  and  found  to  be 
satisfactory,  if  several  conditions  of  a  fairly  general  nature  were  met.  One  specific,  but  com¬ 
plex,  measurement  problem  was  considered  in  detail  and  it  satisfied  these  conditions.  It  is  hoped 
that  further  research  on  this  technique  will  show  that  it  has  applicability  in  other  areas  where 
multidimensional  parameter  sets  are  to  be  measured. 
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APPENDIX  A 
LEMMAS 


Lemma  1. 

If  fi(r)  is  the  log  moment  generating  function  of  a  random  variable  m,  fi*(t)  is  that  of  the 
random  variable  m*,  and  fi(r,  t)  is  their  joint  log  moment  generating  function,  then 

M-(r>  t)  ^  \iQ(r)  +  \±*(t) 
n(r)  <  \±Q(r) 

H*(0  <  Fi*(t) 

where  ^(r)  =  l/2  fi(2r)  and  fi*(t)  =  l/2  jjl3^ ( 2t) . 

Proof. 

Schwartz  inequality,  in  its  most  general  form,  states  that 
E2  [  f(x)  g(x)]  £  E  [f2(x)]  E  [g2(x)] 

thus 

y2(r,  t)  =  E2  [  erm  etm*]  <  E  [e2rm)  E  [e2tm*] 
y(r.t)  <  [y(2r)  y*(2t)]l/2 

n(r,  t)  <  ^  In  y(2r)  +  j  lny*(2t) 

=  i  n(2r)  +  \  ji*(2t) 

=  K0(r-)  +  H*(t)  • 

Also, 

y2(r)  =  E2  [  erm]  <  E  [  e2rm] 
y(r)  <  [y(2r)] i//z 
ji(r)  <  j  In  y(2r) 

=  j  M-(2r) 

=  K0(r)  . 

Similarly, 

\  In  y*(2t) 

=  H*(t)  • 
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Lemma  2. 


Let  g(n)  be  a  positive  symmetric  function  of  n,  a  random  variable  with  symmetric  probabil¬ 
ity  element  p(n)  dn.  Let  g(n)  be  monotone  decreasing  as  a  function  of  |n|  .  Then,  if  |  6 ^  I  ^  |  6 ^  | , 

E  [g(n  —  «1)]  >E  [g(n-62)]  . 

Proof. 

Let  G(n)  =  g(n  —  6^)  —  g(n  —  6^).  By  Lemma  2a,  G(n)  is  asymmetric  about  (6^  +  62)/2  ^  6q. 

If  6  2  <  -  |  6  J  ,  <5q  <  0,  and  when  n  <  6q,  |  n  —  6^1  =  |n-61  +  (61— 62)|  <  |  n  —  6  1 1  .  Thus,  by  the 

monotone-decreasing  assumption  for  g(n),  G(n)  <  0  for  n  <  6q  and  G(n)  >  0  for  n  >  6q.  Similarly, 

if  67  >  |  6  .  | ,  G(n)  >  0  for  n  <  <5  ,  and  G(n)  <  0  for  n  >  6  .  There  are  four  cases: 
l  o  o 


(1) 

«2  »  |6j 

«4  >0 

(2) 

«2 >  1 1 

«4  <  o 

(3) 

62<-  IfiJ 

«! 

(4) 

«2<“  |a4l 

6,  <  0 
1 

We  consider  in  detail  case  (1);  the  others  follow  in  a  similar  manner. 

p(n)  G(n)  dn 


■t 

■i 


,(61+62)/2 

/OOO 

p(n)  G(n)  dn  +  \ 

-OO 

4«4+«2)/2 

■(«,+a,)/2 

r(6.+62)/2 

l  c. 

-OO 

p(n)  G(n)  dn  +  \ 

d_oo 

'(61+62)/2 

r(6  +6  )/ 2 
p(n)  G(n)  dn  -  \ 

d— OO 

-OO 

.(«1+«2)/2 

-OO 

G(n)  [p(n)  -  p(n  -  6^  -  62)] 

p(6^  +  6 ^  —  n)  G(n)  dn 


dn 


^0  . 

The  inequality  follows  from  the  fact  that  p(n)  >  p(n  —  6  ^  —  62)  for  n  <  (6  ^  +  <5 ^)/Z  and  the  other 
manipulations  are  possible  because  of  the  symmetry  assumption. 

Lemma  2a. 

If  f(x)  is  a  symmetric  function  of  x,  f(x)  —  f(x  —  6)  is  asymmetric  about  6/2. 

Proof. 

By  the  definition  of  asymmetry, 

g(x  +  6q)  =  -  g(-  x  +  6o) 

then 
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Q.  E.D. 


f(x  +  6/2)  -  f(x  +  6/2  -  6)  =  f(x  +  6/2)  -  f(x  -  6/2) 

=  f(-x  -  6/2)  -  f(6/2  -  x) 

=  -  [f(  x  +  6/2)  -  f(— x  -  6/2)) 

=  -  [f( — x  +  6/2)  -  f( — x  +  6/2  -6)] 


Lemma  3. 

If  f  is  a  continuous  function  and  has  a  continuous  derivative  on  (a,  b)  and  if  f(b)  =  0, 
and  f(a)  >  0,  then  there  exists  an  x  e  (a,  b)  such  that  f(x)  =  0.  In  addition,  there  exists  c 
such  that  f' (y)  =  0. 

Proof. 

Under  the  above  conditions,  there  exists  an  <5q  >  0  such  that  f‘(b  —  6)  >  0  for  all  <5,  0 
Hence  f  is  monotone  increasing  on  (b  —  b).  Thus  there  exists  a  w  <  b,  such  that  f(w) 

Since  f  is  continuous,  there  exists  an  x  e  (a,  b)  such  that  f(x)  =  0. 

Rolle's  theorem  provides  the  second  part.  Since  f(x)  =  f(b)  there  exists  aye  (x,  b) 
that  f'(y)  =  0. 


f'(b)  >  0, 
i  y  e  (a,  b) 


<  5  <  6q. 

<  0. 

such 
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APPENDIX  B 

LINEAR  REGRESSION  ANALYSIS 


If  the  equations  expressing  the  relationship  between  the  undisturbed  filter  output  and  the 
xilter  components  are  examined,  it  becomes  clear  that  the  problem  can  be  formulated  in  terms 
of  linear  regression  theory  and  that  its  techniques  can  be  applied  directly.7  Now  y^,  .  .  .  ,  y^^ 
are  2N  —  1  independent  random  variables,  all  having  variances  o’2  and  with  means  given  by  the 
so-called  regression  function 

N 

E  tyl  =  zi  =  Z  aij  hj  i  =  1 . 2N  -  1  (B-l) 

j=1 

where  the  (a^,  .  .  . ,  a^j),  i  =  1,  .  .  . ,  2N  —  1  are  known  vectors  constructed  from  the  N  input 
components  s^,  .  .  .  ,  s^.  If  we  use  normal  linear  regression  analysis  techniques,  the  parameters 
hf,  .  .  . ,  h^  can  be  estimated  by  considering  them  as  regression  coefficients  to  be  determined. 

Let  be  an  arbitrary  unbiased  linear  estimator  for  h^.  Thus 

2N-1 

h=  Z  aiiyi  i  =  1 . N  •  (B-2) 

The  unbiased  requirement  further  implies 

2N  - 1  N 

£  «y  £  (B-3) 

j  =  l  k=l 


which,  in  turn,  requires  that  the  a?y  must  satisfy 
2N- 1 

ZQf . .  a..  =  6., 
ij  jk  lk 

j  =  1 


(B-4) 


where  6^  is  the  Kronecker  delta.  If  we  desire  a  minimum  variance  estimator,  we  must  minimize 


2N-1 


j  =  1 


t^]=  Z  «2*2 


(B-5) 


with  respect  to  a „  subject  to  Eq.  (B-4). 

Using  a  set  of  LaGrange  multipliers  to  include  the  constraints, 

N  /2N-1 


aaij 


var 


[£il-2T2  £  \k  £ 

k=l  \  j  =  l 


6. 


ik 


=  0 


i  =  i . N 

j  =  1 . 2N  -  1 


(B-6) 


N 


2a2a..-2cr2  £  X-k  a.k  =  0 
k=l 


i  =  1 . N 


iJ  ^  ik  jk  j  =  1,  .  .  .  ,  2N  -  1 


(B-7) 
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In  addition,  there  is  the  constraint  equation 


Thus 


2N- 1 


Ea  . .  a.,  =  6., 

ij  jk  lk 

j  =  1 


[Eq.  (B-4)] 


a 


ij 


N 


£  Xik  ajk 
k=l 


If 


we  multiply  this  result  by  a 


and  sum  over  j. 


2N- 1 


E 

j  =  1 


a. .  a., 

ij  J* 


2N-1  N 


^  2  xik  ajk  ajf 

j=l  k= 1 


N 


=  E  \kck, 

k=  1 


where 


=  6 


if 


(B-8) 


(B-9) 


2N-1 

Ckf  =  E  ajk  aj{  •  (B-10) 

j=1 

As  long  as  the  (a.^  .  ,  aiN)  are  linearly  independent,  it  is  clear  from  Eq.  (B-9)  that  the  solution 

for  the  is  given  by  the  matrix  equation 

[xik1  =  [cik]_1  =  Icik)  (b-u> 

where  is  the  element  in  the  i^  row  and  k^*1  column  of  [c^]  V  Consequently,  from  Eqs  (B-2), 
(B-8),  and  (B-ll),  the  minimum  variance  unbiased  linear  estimator  for  lr  is  the  linear  combina¬ 
tion  of  the  samples  given  by 


2N-1  N 

£i=  E  E  clkajkyj 

j=i  k=i 
2N-1 

=  Z  (B-12, 

j=l 

where 

N 

dij=  E  -ikajk  •  (B-13) 

k=l 
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To  find  the  variance  of  this  estimate,  we  substitute  Eqs.  (B-4),  (B-8),  and  (B-ll)  into 
Eq.  (B-5). 


2N-1 


2  2 


var  r?i I  =  Z  ay  <r 
j  =  1 


=  a 


=  a 


2N-1 

N 

Z 

a . . 
iJ 

Z 

j  =  1 

k=l 

2N-1 

Z 

N 

Z 

ik 

c 

j  =  1 

k=  1 

V •  i  a., 
ik  jk 


l  •  •  d  • , 

ij  Jk 


N 


=  °Z  l  * 


ik 


ik 


k=  1 


2  ii 
<7  c 


We  may  note  that  in  the  problem  under  consideration, 


a  = 
jk 


Sj  “k+1 


k  <  j  <  N  +  k  —  1 
otherwise 


Thus  Eq.  (B-10)  becomes 

N+k-1 


2  Sj-k+l  Sj-£  +1 
j=k 


N 

=  Z 

m=  1 


s  s  ,i  . 
m  m+k-£ 


the  autocorrelation  function  of  the  input  signal.  Should  this  be  an  impulse, 


then 


c.  .  =  N  <5,  - 
k£  k£ 


k£  _  J,  . 
c  "  N  6kl 


and  the  weights  for  the  terms  in  the  linear  regression  formula  will  be 

N 

dij  =  £  £6ikSj-k+l 


k- 1 
N  sj-i+l 


(B-14) 


(B-15) 


(B-16) 


(B-17) 


(B-18) 


(B-19) 


It  is  thus  clear  that  the  minimum  variance  unbiased  linear  estimate  for  hr  is  the  cross 
correlation  of  the  filter  input  with  the  noisy  output  given  by 
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(B-20) 


N+i-1 

fl  =  N  £  SJ-i«  ' 

The  corresponding  variance  can  be  computed  from  Eqs.  (B-14)  and  (B-18) 
var  [£.]  =  a2/N 

If  the  input  does  not  have  an  impulse  for  an  autocorrelation  function,  a  modified  form  of  the 
cross  correlation  is  used.  The  pertinent  coefficients  are  to  be  found  in  Eq.  (B- 13)  which  re¬ 
quires  the  inversion  of  the  c^  matrix. 


80 


APPENDIX  C 

WEAKENING  THE  DIFFERENTIAL  BIAS  ASSUMPTION 


In  this  appendix,  we  show  that  the  differential  bias  assumption  can  be  weakened  to  a  condi¬ 
tion  on  the  sum  of  the  individual  biases,  if  the  probability  density  of  the  noise  has  a  certain  gen¬ 
eral  property.  Specifically,  we  assume  that  Pn(n)  is  upper  bounded  by  a  function  of  the  form 
Ae  where  A  and  a  are  any  positive  constants.  Then  we  show  that  the  moment  generating 

functions  obtained  under  the  differential  bias  assumption  can  also  be  obtained  under  the  more 
general  condition 

k 

E  Kl  >k6 

i=l 

for  all  k,  where  5  is  a  constant  and  the  {6.}  are  the  differential  biases  of  the  incorrect  branches 
involved. 

First,  from  Eq.  (13) 

y^ht)  =  etR  C  Pn<x)  Pn(x  +  5)1  dx  t  >0 

'-Leo 

Thus,  from  the  hypothesis  above,  assuming  that  d^  is  positive  (the  case  of  d^  negative  follows 
in  a  similar  manner), 

y.(1)(t)  <  A1+t  etR  [y6  eax  ea(x+5)t  dx 


e«xe-a<x+6)tdx+^“ 


,-«(x+6)t  dx  +  ('  e-«x  e-o(x+«)t  dx 


.  1+t  tR 

-ad 

-aSt 

A  e 

e  -  + 

e 

a 

1+t 

i  - 1 

2  etR  A1+t 

r-adt 

-t  e'° 

1 

l-k. 

1 

e+ 

le 

-adt 


1  +  t 


t  ^  0 


0  tR  Al+t  -a6t 
2  e  A  e 

a(  1  —  t2) 


2  etR  A1+t  e'a6 


Of  ( t  —  1) 


0  <  t  <  1 


1  <  t 


for  5  >  0  and  the  sign  of  the  exponent  is  reversed  if  d  <  0.  Thus,  in  either  case,  the  dependence 
on  |  5  |  is  exponential.  When  we  consider  the  moment  generating  function  of  the  sum  of  the 
metric  on  many  incorrect  branches  as  in  Eq.  (13),  we  take  the  product  of  the  moment  generating 
functions.  Thus  the  required  moment  generating  function  is  bounded  by  an  expression  that  is 
exponential  in  the  sum  of  the  magnitude  of  the  corresponding  individual  biases.  In  particular, 
the  bound  is  of  the  form 
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f  1(t)  exp 


0  <  t  <  1 


—at  £ 


f2(t)  exp 


—a 


k 

l 

i=' 


Thus  if 


k 

2 

i=  1 


IsJ  >  k6  for  all  k. 


(t)< 


f4(t)  e‘atk6 
f2(t)  e'ak6 


as  before. 


Isj 

|6.|  l<t 

0<  t<  1 

1  <  t 
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APPENDIX  D 

AVERAGE  REFLECTION  FROM  DEPTH  N 


In  this  section,  we  make  a  calculation  to  indicate  roughly  the  size  of  the  effect  one  is  trying 
to  observe  in  the  geophysical  model  described  in  Sec.  II-E.  In  particular,  we  consider  the  mag¬ 
nitude  of  a  reflection  from  a  discontinuity  at  depth  N.  However,  in  order  not  to  specialize  the 
result  to  a  particular  sequence  of  impedance  values,  we  compute  the  average  reflection  over 
the  ensemble  of  all  such  sequences.  In  addition,  the  result  is  confined  to  the  initial  return, 
since  this  is  the  one  used  to  make  the  initial  decision  on  an  impedance  and  thus  determines 
whether  the  estimate  will  require  correction. 

Assume  first  that  each  impedance  value  is  chosen  independently  and  that  the  impedance  at 
depth  n  is  or  with  probability  one  half.  Assume  also  that  Z^  =  Z^.  Then  there 
are  four  possibilities  for  discontinuity  at  depth  N: 


ZN-1  ZN 


(1)  zA  zA 

(2)  ZA  Zfi 

(3)  ZB  ZA 


(4)  ZB  ZB 


The  first  and  the  last  give  no  reflection,  while  the  other  two  do  so.  We  thus  consider  (2)  and 
(3)  for  reflections. 

Suppose  that  Z^  ^  =  Z^  and  Z^  =  Z^.  Since  Z^,  Z^-l'  anc*  ZN  are  determinec*'  there  are 
N  —  3  impedance  values  to  be  chosen  at  random.  But  for  K  sections  there  are  K  +  1  transitions 
between  them,  and  thus  there  are  N  —  2  locations  for  possible  reflection.  In  the  case  under 
current  consideration,  Z^  and  Z^.  ^  are  both  Z^.  Thus,  if  there  are  n  transitions  from  A  to 
B,  there  are  n  transitions  back  from  B  to  A.  These  2n  transitions  can  be  arranged  among 
the  N  —  2  potential  transition  locations  in  ?  2)  ways.  Thus  the  probability  of  n  transitions 
from  A  to  B,  assuming  that  Z^  and  Z^  ^  are  both  Z^,  is  (  2n  /  2  '  * 

If  there  are  n  A  to  B  transitions,  and  n  B  to  A  transitions  for  a  signal  moving  in  the 
direction  of  increasing  depth,  there  are  the  same  number  of  each  in  the  direction  of  decreasing 
depth.  Hence  the  transmission  coefficient  for  an  input  pulse  along  a  path  to  discontinuity  at 
depth  N  from  A  to  B  with  n  other  A  to  B  transitions  along  the  way  is 


AB  BA  AB 


where  T^g  is  the  transmission  coefficient  for  a  discontinuity  from  A  to  B,  is  the  same 

quantity  for  the  reverse  case,  and  is  the  reflection  coefficient  for  an  A  to  B  junction. 

Thus,  over  the  ensemble  of  impedance  value  sets,  the  average  transmission  coefficient  is 


2n  T  2n  r  ?-(N-3) 
1  AB  BA  AB 


n=0 


for  N  even.  If  this  sum  is  carried  out  by  using  the  binomial  expansion, 
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T  =  2 


(N-2)  [ 

(l  +  4r  -^N"2  +  / 

i-  4r  f'2] 

(1  -  r) 

l' 

'  (1  +  r)2'  ' 

(1  +  r) 

where  r  =  Z^/Z^  .  The  cases  of  N  odd  and  the  reversed  discontinuity  at  depth  N  follow  in  a 
similar  manner  and  give  essentially  the  same  result. 

Since  r  is  positive,  the  first  term  in  the  square  brackets  is  the  most  significant  for  large 
N.  Thus  the  return  is  exponentially  decreasing  with  N,  for  N  large. 
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